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I. INTRODUCTION 


The cellular telephone has become ubiquitous. 


Teenagers carry them to school, and adults carry them to 


work. They provide connection and communication, 
information and entertainment. In the U.S., 93% of the 


population has access to a cell phone, and 24.5% of 


households have abandoned the landline to use cellular only 











lee Along with the cell phone, the short messaging 
service (SMS) has also gained popularity. Americans sent 
7.2 billion SMS messages a month in 2005. In 2010, that 
value increased to 173.2 billion a month. The annualized 
value of 1.81 trillion text messages a year comes close to 

















matching the 2.26 trillion minutes of cell phone use in 


2010 [1]. SMS messages are an integral part of modern 
communication. 
A. IDENTITY ISSUES 


The benefits and convenience of SMS messaging, 
however, bring with them new difficulties for human 


identity. For example, one can answer a phone call and 








immediately detect that it is one’s sister on the other end 
of the line by the sound of her voice. However, upon 


receiving a text message from one’s sister, it may be her, 








or she may have her husband key the message while she is 
driving. While this is an innocuous example of an identity 


mismatch, it is easy to imagine more malicious behavior. 





Identity is a crucial part of network security. 








Devices communicate their identity to a network at the 
network link layer in the form of a media access control 


(MAC) address; cell phones on a Global System for Mobile 
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Communications (GSM) network use an international mobile 


equipment identifier (IMEI). A sophisticated adversary can 





falsify or “spoof” these identification codes to appear as 








a different device. Users authenticate to the network at 





the application layer in the form of passwords or biometric 
information. Passwords have well-known vulnerabilities if 
they are not carefully selected, and biometrics have not 
achieved widespread use. Users can access web-based 


applications from any internet-capable device, allowing 





independence from a specific platform. 


For authentication mechanisms in cell phone networks, 





the provider mandates the user have a physical token in the 
form of a registered phone or subscriber identity module 


(SIM) card to gain access to the network. Even this notion 





of “registration” is not uniformly employed. Legislators 
in the Philippines just introduced a bill in January 2011 


regulating the sale and distribution of SIM _ cards. 





Currently, pre-paid SIM cards and cellular phones can be 


purchased in the Philippines and many other countries, 





without having to provide any identification or register a 











legal name with a network provider. More trivially, phones 
may also be lost or misappropriated. Thus, it is difficult 
to tie a cell phone used in an illegal activity, such as a 


kidnapping, with its user [2]. 


A registration system may improve accountability in 





cell phone use, but policy alone cannot guarantee that the 
name in the database associated with a phone is the same 
person using the phone at any point in time. This identity 


uncertainty can also be problematic in situations that do 





not involve illegal activities. A business that issues 





cell phones to its employees may not want those phones used 
for non-work-related communications. A government agency 


may want an unobtrusive way to ensure that an employee has 











not lost or loaned his phone to a family member. In these 





Situations, an authority wants to establish and monitor a 
device-to-user binding, associating a specific user to a 
specific phone. Beyond security, a phone that is 


contextually aware may wish to display specific information 





or act differently depending on the user. We propose that 


it is possible to identify the user of a mobile wireless 








devic based on the statistical analysis of user’s text 


messaging characteristics and their phones’ radio 





transmission signals. 





B. RESEARCH QUESTIONS 


This thesis addresses two questions related to 
identity determination on mobile devices. We first examine 


whether combining user-specific text authorship 








characteristics and device-specific signal characteristics 


in a naive Bayes classifier improves upon the accuracy 





results of classifying these characteristics individually. 








The second question asks if this classifier can detect when 


a phone normally used by one individual begins to be used 








by a different individual. We use an authorship 
attribution analysis of the text of short messages as the 


user classifier, and an analysis of signal modulation 





characteristics as the device classifier. 





Cy SIGNIFICANT FINDINGS 


This research produced the following significant 


results: 


Classification of 120 individual Twitter messages 
from 50 authors using a multiclass naive Bayes 
classifier produced 40.3% authorship attribution 














accuracy, less than the 54.4% found by Layton, 
Watters, and Dazeley using the Source Code Author 
Profiles (SCAP) method [3]. 





Combining multiple Twitter messages to generate a 
text feature vector for input to the classifier 
improves authorship attribution accuracy. Using 
a featur vector from 23 combined messages 
produces the best result of 99.6% accuracy. 














Classification of 120 individual cell phone radio 
Signal modulation characteristic vectors for 20 

















GSM cell phones resulted in a 90% classification 
accuracy. This compares favorably to the 99% 
accuracy of Brik et al. for modulation 


characteristics of 802.11 devices [4]. 


Sum rule 
classifiers 


combination of the text and phone 
improves upon the results of the text 











classifier. Multimodal classifier accuracies 
over 99% were attained when using individual 





classifiers that employed the method of combining 
multiple messages to create the input feature 
vectors. 


The multimodal classifier was able to detect a 
Simulated new user on a phone 36% of the time in 
the best-performing configuration. 














ORGANIZATION OF THESIS 


This thesis is organized as follows: 








e Chapter I discusses the difficulty of 
ascertaining identity on mobile devices and the 
research questions we address in our 
experimentation. 

e Chapter II discusses prior work in authorship 
attribution, device identification, and the 





machine learning techniques used in this study. 


e Chapter III describes the methods used to 
and process data and set up and xeC 


collect 
ut th 











classification experiments. 


Chapter IV contains the results of 
experiments and analysis of their significance. 





Chapter V contains conclusions drawn from 
results and possible areas of future research. 





the 





the 
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II. BACKGROUND 


A. INTRODUCTION 


Developing a binding between a user and a device 
involves merging the efforts to classify the user by 
applying authorship attribution methods, e.g., statistical 


word counts, social network structure, Gte.; and to 





classify the device using the characteristics of its 


wireless signal. This chapter describes the textual and 
Signal domains that provide our data. We discuss 
authorship attribution and device identification 




















techniques, followed by an overview of machine learning 





classification methods. A description of the software 





tools used in this research concludes the chapter. 


B. TWITTER 


Twitter provides a popular “microblogging” service, 
allowing users to communicate with messages of 140 
characters or less known as tweets. Users subscribe to 


another user’s message “feed” to “follow” them, receiving 





messages from the user they follow. Twitter also provides 








a mechanism for users to reply to a tweet, directly send a 
message to another user, or repeat a received tweet to 
their own set of followers, thereby expanding the 
readership of that tweet. Users have the option to specify 


that their tweets are private, viewable only by their 





followers or the direct recipient of a tweet, or publicly 








viewable. Users post their messages to Twitter via 
twitter.com, text messages, or third party clients, 
including mobile applications. As of September 14, 2010, 
Twitter reports it has 175 million users, while 95 million 
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tweets are sent per day [5]. We expect Twitter, and 


Twitter-like services, to continue to gain in popularity 





and that our work will be relevant to not only Twitter, but 


to new services that emerge. 


1. Twitter Attributes 





Twitter’s primary characteristic that differentiates 


it from e-mail, chat, or a standard blog is its 140- 








character length limit. In this respect, tweets have more 





in common with short messag service (SMS) messages than 


any other communication technology [6]. Many language 





conventions of chat and SMS such as abbreviated spellings, 








acronyms, misspellings, and emoticons (i.e., combinations 





of characters that represent motions, for instance a 
smiley face using a colon and a right parenthesis) are also 


used extensively in Twitter. While some misspellings are 





accidental, others are for effect, such as writing 
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“sleeeepy” instead of “sleepy.’ Another technique we note 
in our examination of our ‘Twitter corpus is the chat 


convention where writers use asterisks before and after a 





statement to indicate action, for example “really? *bangs 


head on desk*.” The similarities noted between SMS and 





Twitter text imply that analysis methods that work in one 


domain will also work in the other. 


Twitter adds two unique message attributes beyond SMS: 
the @ sign followed by a user’s screen name to indicate a 


reference to that user, and the # sign followed by a topic 





tag for use in grouping and searching messages by topic 





thread. We shall refer to these attributes as @names and 
#tags. In [7], Boyd, et al. found that, in a random sample 
of 720,000 tweets, 36% of them contain a @name and 5% 
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contain a #tag. Figure 1 is an anonymized example of a 


tweet using these attributes. In this example, the sender 





directs the tweet to @Userl in a conversational manner, 


referencing @User2 within the comment. 


@Userl no wonder @User2 never wrote me back #epicfail 


Figure 1. Typical Twitter Message 


Another common message attribute is the Internet URL. 
As a text-only communication medium, Twitter users include 
Uniform Resource Locator (URL) links to outside content 
they wish to share [8]. This practice has given rise to 
URL shorteners, services such as http://bit.ly that provide 
redirection from a longer standard URL to a shortened URL 


(l.e., Rttpr//bit.ly/alb2c3), enabling more efficient use 





of the limited message space. 


Cc. PRIOR WORK IN AUTHORSHIP ATTRIBUTION 


Authorship attribution takes a piece of written 





material and attempts to identify its author. Typically, 


this is done through a supervised learning process, taking 











material known to be written by an author and building a 
model from it, then gauging how well the writing in 
question fits the model. Researchers have found different 
ways to build these models. A discussion of several of 
these techniques follows, with an emphasis on those that 


have shown success with short messages. 


1. Lexical Feature Analysis 


Lexical features treat the text as a series of tokens, 








with a token consisting of a word, number, or punctuation 
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mark, or some combination of alphanumeric characters. The 
author model consists of statistics such the distribution 


of sentence length, vocabulary richness, word frequencies, 





etc. An example of vocabulary richness is the ratio of the 





number of unique words in a corpus to the total number of 
words in the corpus. Vectors built from word frequencies 
that include the most common words, such as prepositions 
and pronouns, represent the author’s style, and are most 


often used in authorship detection. When vectors discard 





high frequency words with little semantic content, those 











prepositions and pronouns tend to perform better in topic 


detection [9]. 


In her 2007 thesis, Jane Lin used lexical features to 





profile authors of the NPS Chat Corpus by age and gender. 





In the processing of her corpus, she grouped Internet chat 


utterances by the age reported in the user’s profile, 





maintaining punctuation marks intact. This allowed her to 


build a dictionary of common emoticons and use them as a 








feature for classification. In her analysis she used th 





following features: emoticon token counts, emoticon types 








per sentence, punctuation token counts, punctuation types 





per sentence, average sentence length, and average count of 
word types per document (vocabulary richness). She used a 
naive Bayes classifier, which we describe later, to compute 


classification accuracy both with and without’ prior 





probability [10]. 


Lin found that while classifying teens against 20- 





year-olds showed poor results, comparing them to 





increasingly older age groups improved the results. The 


top F-score, a metric of combined precision and recall that 
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we detail later, of 0.932 came from comparing teens to 50- 
year-olds. As most sexual predators are 26 and older, she 
compared those under 26 to those over 26 with a resulting 


F-score of 0.702. Based on the results and her data, she 





suggested that other machine learning techniques may 





perform better [10]. 


2. N-Gram Feature Analysis 


While the use of word features captures the style of 


the author well, it fails to capture certain features 





common to short messaging. Emoticons, abbreviations, and 
creative punctuation use may carry morphological 
information useful in stylistic discrimination. Custom- 





built parsers, such as used by Lin in the work described 











above, could pull these features out of the text but adda 
level of complexity to tokenizing and smoothing [9]. An 
alternative approach uses character-level n-grams as _ the 
feature type. This method disregards language-specific 
information such as word spacing, letter case, or new line 


markers. It also eliminates the need for taggers, parsers, 





or any other complex text preprocessing. 





In [11], Keselj et al. used byte-level n-grams for 
authorship attribution of English, Greek, and Chinese 


texts. For each author they built a profile of the L most 





common character n-grams and their normalized frequencies. 
The basic theory of this method is that authorship is 
determined by the amount of similarity between the profiles 
of two texts, classifying a test profile as the author 


profile from which it is least dissimilar. The measure of 





dissimilarity is a normalized distance metric based on the 








n-gram frequencies within the text profiles. They refer to 
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this measurement as the relative distance between two 





texts. For English texts by eight classic authors, they 
achieved 100% accuracy for several different n-gram and 
profile sizes. On Greek data sets drawn from newspaper 


texts they attained an accuracy of 85%, surpassing the 





previous best reported accuracy of 73% for that data set. 








These results suggest that byte-level n-grams have some 








useful application in authorship attribution. 


Keselj’s method of determining the difference between 





two author profiles of byte-level n-gram features was 


expanded upon and simplified by [12] in order to apply the 





technique to a different textual domain. Instead of the 


normalized distance metric used by Keselj to differentiate 





authors, Frantzeskou et al. built profiles of the L most 
common n-grams used by the authors of computer source code 
samples. Unlike the previous method, this approach does 


not normalize the n-gram frequencies. They call this the 





Source Code Author Profiles (SCAP) method. The size of the 





set of n-grams in the intersection of the two author 











profile sets measures the distance between the authors. A 
test document gets classified as the author with whom this 


intersection set is largest. 


Frantzeskou et al. used a corpus of C++ programs 





applying Keselj’s method and the SCAP method to data from 
six authors. While results were similarly good for both 
methods with 100% accuracy at higher profile size (L) 


values, or number of n-grams per author, SCAP performed 





slightly better at lower values of L, and significantly 


better with bi-grams. On a corpus consisting of Java code 





with no comments, SCAP again performed better with 
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accuracies from 92 - 100% across several values of L and n- 


gram sizes. The relative distance method performed well at 





lower values of L, but poorly at the highest L value tested 








[12]. The SCAP method provides a mathematically simple and 








effectiv means of conducting authorship attribution on 


source code material. While computer program source code 





and short messages have very different structures, both 


domains may present at first glance the impression of very 





broken, oddly punctuated English. Although Twitter covers 
a wider vocabulary range, authorship attribution methods 


effective in one domain may show similar effectiveness in 








the other. 


The success of the SCAP method with source code led to 


an examination by [3] of its viability for authorship 





attribution of short messages, specifically those sent via 


Twitter. Layton, Watters, and Dazeley examined 50 users 





randomly from a set of 14,000. The 140-character limit of 





Twitter messages restricted the amount of unique characters 





sufficiently that they used a value of L that encompassed 
all characters used by an author. The value of n was 
varied from 2 to 7 characters. The experiment used three 
different text preprocessing methods to gauge the effect of 


the tagging conventions unique to Twitter, with one method 





removing @names from the text, one removing #tags, and one 


removing both. 


Applying the SCAP methodology to Twitter produced a 


best result of 72.9% accuracy using character 4-grams and 





with both @names and #tags included in the message text. 
The @name influenced results the most, showing an average 


26% accuracy drop when removed. The #tags reduced accuracy 


1:3 


by only 1% on average. This implies that the inclusion of 


user social network analysis can significantly improve the 





ability to identify that user. The threshold number of 
tweets per author beyond which accuracy did not 


Significantly improve was found to be 120. This study 





showed that authorship attribution of short messages with 
the SCAP method performs much better than chance, with the 


addition of information on the user’s social network 








Significantly improving the classification performance [3]. 





As short messages sent via SMS do not generally contain 
this social network information, their best accuracy result 
of 54.4% with both @names and #tags removed is a more 
realistic benchmark for authorship attribution of short 


messages. 


This subsection described several different methods 
for authorship attribution in a variety of textual domains. 


Figure 2 summarizes the key points discussed. 
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Lexical Features, 0.932 F-score, teens vs 
naive bayes 50-yr-olds. 
0.702 F-score, teens vs 
over 26. 


Byte-level n- English 100% accuracy, English 
grams, novels, novels. 

normalized Greek 85% accuracy, Greek 
distance metric | news newspaper 


Frantzeskou | Character n- C++/Java | 92-100% accuracy 
grams, SCAP source 
code 


Layton Character n- Twitter 72.9% accuracy best 
grams, SCAP result. 
544.4% accuracy 
without #tags or 
@names 


Figure 2. Comparison of Several Authorship Attribution 
Techniques on Different Textual Domains (After 
[LO}y [EL Th2)y [31 





D. PRIOR WORK IN DEVICE IDENTIFICATION 


Accurate identification of individuals on a network is 





an important security concern. A number of security 


exploits involve mimicking an authorized user to gain 





access to a network. There is a parallel problem of trying 


to identify individuals involved in nefarious activities 





who may be trying to obfuscate their communications 





activities by routinely changing devices or otherwise 
misrepresenting themselves on a communications network. A 


passive means of correctly identifying an authorized device 





and its user by means of network characteristics, 


electronic emissions, and/or textual analysis could 
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minimize the impact of spoofing attacks and contribute to 
intelligence or law enforcement efforts to track a specific 


individual. 


Research in the 802.11 wireless domain shows that 





individual devices can be identified quite well by their 
radiometric signatures, even among users with the same 


brand of device. This is due to inherent variability in 





the manufacturing process. Other research has focused on 
authorship attribution based on analysis of an individual’s 
language use. No known research to date has combined the 


two identification methods in an effort to improve the 








classification of users to devices in a network. This study 





will attempt to do so, with a focus on wireless and 














cellular SMS communications. 








Identification of radio frequency (RF) transmitters by 





their signal characteristics has been accomplished with 


good success, particularly in the radar domain. That 





technology has advanced from basic measures of frequency, 
amplitude, and pulse width to fine-grained analysis of 
unintentional modulation on pulse (UMOP), which looks at 
pulse artifacts unique to individual transmitters. Once a 


radar is positively identified as transmitting a signal, 





that radar can be identified by that signal in the future. 





Unknown radars can be classified by manufacturer. A Litton 
Applied Technology UMOP analysis method was able to 
identify radars at 90-95% confidence level in the early 
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1. Signal Transient Characteristic Method 


Communications and data signals can be more complex 


than radars, with different modulation schemes, spread 
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spectrum technology, and frequency hopping to enhance 
security, reliability, and capacity. Several methods have 
been proposed to "fingerprint" wireless transmitters by 


their physical, link, or application layer characteristics. 





Danev and Capkun have proposed a method to fingerprint 





802.15.4 CC2420 radios by analyzing RF signal transient 


characteristics [14]. When a RF signal is transmitted, 





there is a period at the start of the signal where the 


amplitude ramps up from no energy to actual packet 





transmission at power. This part of the signal is the 





transient, and its characteristics vary depending on the 


analog hardware of the transmitter. Danev and Capkun 





extracted transients from 500 signals and applied a feature 


selection process to obtain distinctive templates of each. 





This process consisted of a transformation stage and a 
feature extraction stage. The transform method that gave 


them th best results was one that measured the relative 





differences between adjacent fast fourier transforms 





spectra. Th featur xtraction process took the 





transformed transient data and extracted spectral Fisher- 





features using a Linear Discriminant Analysis derived 
linear transformation. They show that their process 
identifies sensor nodes with an accuracy of 99.5%. This 


was on a set of 50 radios made by the same manufacturer. 


They did find that changes in antenna polarization reduced 





their accuracy, so this method works well only with fixed- 


location transmitters and receivers. 


2. Steady State Signal Characteristic Method 


Another identification method described by Candore, 





Kocabas, and Koushanfar, looks at the RF characteristics of 
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the steady-state part of the signal for unique elements 
imparted by transmitter hardware [15]. They do this by 


developing individual classifiers that may be weak for the 





following characteristics: frequency difference, magnitude 
difference, phase difference, distance vector, and I/Q 
origin offset, where difference/distance/offset refers to 


differenc between the ideal values and actual measured 





values of the signal. These individual classifiers are 





then combined with weighted voting to form a stronger 
classifier. Their work uses a Wireless Open-Access 
Research Platform (WARP) built around a computer, field- 
programmable gate array (FPGA) for the digital signal 





processing, and radio cards operating in the 2.4 GHz and 5 
GHz bands. They use Differential Quadrature phase-shift 


keying modulation and extract their signal signatures in 





the modulation domain. After training the classifier on 


data collected from 200 frames of 1844 random symbols, they 











then use five frames to test it. At five frames, results 
were rather poor for six different radios. Testing with at 


least 25 frames, the individual characteristic classifiers 








each surpassed 50% identification accuracy. Combining the 


classifiers with weighted voting, they got 88% accuracy 





with a 12.8% false alarm probability of correct transmitter 





identification on five frames. One reason they suggested 














for the less than perfect identification results is that 
their WARP radio cards contain many digital components, 


which would have less inherent variability than other 





radios with more analog components in the transmission 


processing stream. If that is true, our software-oriented 





test system may show the same signal stability. 
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3. Modulation Characteristic Method 


The modulation domain was used again in a paper by 





Brik et al., this time applied to 802.11 network interface 
cards (NIC) [4]. They developed a methodology called the 





passive radiometric device identification system (PARADIS). 











Four of the five characteristics they used were the same as 


in the WARP paper: frequency error, I/Q origin offset, 
magnitude error, and phase error. They also used another 
characteristic called SYNC correlation, which is the 


difference between the measured and ideal I/Q values of the 











SYNC, the short signal used to synchronize the transmitter 


and receiver prior to transmitting the data. The 802.11 





physical layer, in many instances, ncodes data with two 





sub-carriers, in-phase (I) and quadrature (Q) that are 
separated by 7/2. In quadrature phase shift keying (QPSK), 
each symbol encodes two data bits and is represented by 


points in the modulation domain using a constellation 





diagram that plots the points in each of the four quadrants 





of a two-dimensional grid. Errors in modulation are 
usually measured by comparing vectors corresponding to the 


I and Q values at a point of time. Phase error is the 





angle between the ideal and measured phasor. Error vector 








magnitude is magnitude of vector difference between ideal 
and measured phasor. Those errors are taken as averages 
across all symbols in the frame in order to minimize the 
effects of channel noise. Figures 3 and 4 are a graphical 


display of the error measurements. 


19 


ideal point 
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worst-case error symbol bit value 





Figure 3. QPSK Error Shown on an I/Q Plane (From [4]) 


ideal signal 





Figure 4. Vector Display of Modulation Errors (From [4]) 





For their experiment, the Brik group used identical 


Atheros NICs configured as 802.11b access points and an 





Agilent vector signal analyzer as the sensor. They tried 
both a k-Nearest Neighbor (KNN) and support vector machine 


(SVM) classification schemes to associate a MAC address to 





a NIC based on the collected modulation parameters. After 
evaluating data from 138 NICs, the best feature set was 


found to be, in order, frequency error, SYNC correlation, 
20 


I/Q offset, magnitude and phase errors for SVM. Freq 
error, SYNC correlation, I/Q offset for kNN. The SVM 
classifier error rate was 0.34%, and kNN classifier error 


rate was 3%. Based on their data, no one NIC was able to 





masquerade as another. Modulation similarities were under 





5% for 99% of the cards. One NIC had a similarity to 
others of 17% [4]. They also suggest that this method 


could work with any digital modulation scheme. 


4. Transport Layer Characteristic Method 


A passive fingerprinting technique proposed by Kohno, 
Broido, and Claffy, eschews the physical layer signal 


analysis, instead exploiting the transport layer for 





identity information by measuring clock skew in transport 
control protocol (TCP) timestamps [16]. Their method 
exploits two clocks on a computer: the system time clock 
and a TCP timestamps option clock internal to the TCP 
network stack. The system time clock may or may not be 


synchronized with true time by connection to a Network Time 





Protocol server. If not, the difference between system 
time and true time can be measured. Most modern operating 


systems enable the TCP timestamps option in their network 





stack. Thus, each TCP packet sent contains a 32-bit 
timestamp embedded in the packed header. They describe 
methods for passively collecting TCP timestamps from 


computers running various operating systems and formulas 








for calculating clock skew from the timestamps. They also 
describe a method for estimating system clock skew by 


sending Internet control message protocol ICMP Timestamp 





Requests to a targeted device, but focus on the TCP method, 





as most network stacks use clocks operating at lower 
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frequencies than system clocks. Also, many routers and 


firewalls filter ICMP messages. For clock skew measurement 











to b ffective, different devices must have different 





clock skews, and the skews must be consistent over time. 











Others have shown that both those assumptions hold, but 
they prove it by collecting two hours of traffic on a major 
link and using their process to find the clock skew of the 
first hour, second hour, and entire period and comparing 
them for each source that was active at least 30 minutes of 
each hour. A plot of their findings found that they were 
able to differentiate between some individual machines by 
their clock skew, but not all. This is an interesting 


method but not useful in our research, as cell phones 


synchronize their clocks with their network upon 
connection. 
E. GSM OVERVIEW 


The Global System for Mobile Communications (GSM) 
standard is the basis for the most popular mobile phone 
system in the world, with over 3 billion connections [17]. 


Its ubiquity and well-established hardware technology make 





it a good platform for experimentation and a good target 


for exploitation. GSM operates as a cellular network with 








a set of base stations distributed over a servic area. 


The distribution is based on the desired coverage level, 








which depends on geography and connection demand. A rural 
area may have a few, high powered base stations spread out 


over a large area, while an urban area might have many 











lower powered units in close proximity [18]. The structure 
of a GSM network is shown in Figure 1. The two left blocks 
of Figure 1 contain the part of the network relevant to 
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this study: the handset, the base transceiver station 


(BTS), and the air, or Um, interface between the two. 





Structure of a GSM network (key elements) Fs yy 
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Figure 5. GSM Network Structure (From [19]) 


1. GSM Network Infrastructure 


In a GSM network, the BTS contains the antennas, the 
transceivers for transmitting and receiving RF signals, and 
encryption gear as needed. While the complete capabilities 
of the BTS vary depending on the network provider, the 


minimum function is to receive the modulated analog RF 





Signal from the handset, convert it to a modulated digital 





Signal, and send it to the base station controller (BSC). 


The BTS can contain more functionality, to include handling 





handover between cells. The BTS is controlled by the BSC, 
which typically controls several BTSs in a network. The 
BSC manages the frequency channels used by its towers, 
handles handovers and switching among its towers, and may 
do the conversion from the air interface’s voice channel 


coding to the coding used in the circuit-switched Public 


oe: 





Switched Telephone Network (PSTN) [20]. A small and simple 
limited network can be assembled using only a BTS with 


appropriate software to manage a specific number of 





handsets. The network assembled for the experimentation 





conducted here is one such limited network. 


2: Mobile Handset 


The end of the cellular wireless network most familiar 





to typical users is the handset. Along with a transceiver 





and digital signal processing unit, a GSM handset also 
contains the subscriber identity module (SIM) card. The 


SIM card is what identifies the user to the network, 





allowing the network to choose to provide or deny access to 
the user. A user can easily switch phones and still access 


their subscribed services by transferring their SIM card to 





the new phone, assuming that phone is unlocked and 
compatible with the network technology. The indentifying 
feature of the SIM card is the International Mobile 
Subscriber Identity (IMSI) number. Each SIM card has a 
unique IMSI associated to the user. The phone itself also 


has a unique identifier, the International Mobile Equipment 








Identity (IMEI) number [20]. These two numbers are 


unrelated, though both may be transmitted through the 








network as part of control signal metadata. 





The air interface between the handset and the BTS is 


the focus of part of the experimentation conducted here. 





GSM providers operate in the licensed 450 MHz, 850 MHz, 900 
MHz, 1800 MHz, and 1900 MHz radio frequency bands. Uplink 
and downlink bands are typically each 25 MHz wide and 
separated by 45 - 50 MHz. Each of these bands is divided 


into 124 carrier frequencies with a 200 kHz bandwidth. An 
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uplink/downlink channel pair is referred to by an absolute 
radio frequency channel number (ARFCN). Time-division 


multiplexing is used to divide each channel into eight time 














slots. A single timeslot in a specific ARFCN is called a 
physical channel (PCH) [21]. Thus, GSM combines FDM and 


TDM to make the most efficient use of its spectrum 





assignment. Each timeslot, or burst, generally consists of 
two 57 bit data fields separated by a 26 bit “training 
sequence” for equalization, three tail bits at each end, 
and an 8.25 bit guard sequence. Gaussian Minimum-Shift 
Keying (GMSK) is the signal modulation scheme used to 


modulate the digital data into the analog RF signal [21]. 


3. GSM Modulation 


GSM uses the Gaussian Minimum Shift Keying modulation 
scheme. This modulation method applies a Gaussian filter 
to the data signal prior to the MSK modulator. MSK is a 
form of digital frequency modulation with a 0.5 modulation 
index. It has several properties that make it good for 
efficient mobile radio use: a constant envelope, a narrow 
bandwidth, and coherent detection capability. This makes 


it relatively impervious to noise. The one thing it lacks 





is the ability to minimize energy occurring out-of-band in 
transmission. The Gaussian filter has a narrow bandwidth 
and the cutoff properties to minimize extraneous 
frequencies, shaping the input data waveform so that the 
output fits a constant envelope. The single channel per 
carrier characteristic of GSM, with carriers spaced 200 kHz 
apart, minimizes off-carrier energy, and thus the Gaussian 


filter is important to clear transmission [22]. 
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The modulation sequence of a typical GMSK signal 
modulator is shown in Figure 6. In this example, a stream 


of binary data formed in a Non-Return-To-Zero (NRZ) 





sequence is sampled and integrated into an analog signal. 
It is then convoluted with a Gaussian function to filter 
out the energy outside the Gaussian form. The real, in 
phase (I) and quadrature (Q) components of the data signal 
are calculated, then modulated onto the I and Q carrier 
waves. The two components are added, and the modulated 


Signal is formed [23]. 
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Figure 6. GMSK Modulation Block Diagram (From [23]) 


Demodulation of the GMSK signal is more complicated, 





particularly for GSM applications. Operating in the 900 
MHz range, GSM is subject to a significant amount of 
interference, to include signal attenuation, multipath 
propagation, and co-channel or adjacent band interference. 


The GSM standard does not specify a demodulation algorithm, 





but does say that it has to be able to handle two multipath 
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Signals of equal power received at up to 16 us apart. This 
implies that an equalizer is required to separate signals. 


Viterbi demodulation incorporates an assumption on _ the 





possible signal and additive noise and uses a probabilistic 





maximum likelihood calculation to produce the most probable 
received signal [23]. A diagram of a typical demodulator 
is shown in Figure 5. It splits the received signal into 


4 


the I and Q components and demodulates each from its 





carrier wave. After going through a low-pass filter to 
clean up some of the noise, the I and Q components of the 
data stream are combined and the signal is converted back 


to a digital NRZ signal [23]. 
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Figure 7. GMSK Demodulation Block Diagram (From [23]) 


F. MACHINE LEARNING TECHNIQUES 


Authorship attribution entails creating a profile of 


an author and matching that pattern to a piece of text. 





Machine learning accomplishes this by building a model 








based on statistical methods, then customizing the model 


with training data or previous experience. The goal of the 





model is not to memorize the behavior of the training data, 
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but to use it to decide if new data points fit into the 





pattern. While there are many machine learning techniques 





based on different statistical mechanisms, this research 
employs naive Bayes. 

1. Naive Bayes Classifier 

The naive Bayes classifier uses Bayes’ Rule of 


probability to assign a given set of features to a class. 


P(FIC)P(C) 


P(CIF)= P(F) 


Bayes’ rule is particularly useful in many practical 








Situations where it is easier to estimate the conditional 


probability of a particular feature given a class. The 





conditional probability of the class given the features, 


P(C|F), depends on the probabilities of the class and the 





features and the probability of the features given the 





class. When F is a vector of d random feature values, F = 


(fipuy Five la) y end-all documents: fall. ante one of ma random 





classes conditional on the feature set, C = (Cyye, Cryer Cn) 


Bayes’ Rule may be expressed as [24]: 


P(F lc,)P(c,) 


P(c, |F) = a 


The classification problem becomes simple when P(cx|F) 





is known; as discussed in [10], [25], and [26] the document 
with feature vector F is assigned to the class with the 


highest conditional probability value, c*: 


c* = arg max 


Cc, EC 


P(Flc,)P(c,) 
P(F) 
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The P(F) term does not change between classes, which allows 
us to omit it from the argmax term, simplifying the above 


formula to: 


c* = arg max| P(F lc, )P(c,)] 


cp EC 


In a standard authorship attribution problem, that 
conditional probability is not known and must be estimated 
from the data and Bayes’ rule. One assumption we make in 


using naive Bayes is that the occurrence of any one feature 





f; 1s independent of any other feature f; in a document of 


class cx. Thus, the distribution of the feature vector over 





Cy may be modeled as: 


POF ic.) =] PC, Ic,) 


Combining the two previous formulas gives the following: 


Sats nan) Pe Tey, 09] 


cp EC j=l 


The product operation applied to probabilities can 
cause the above equation to yield very small values for c*. 
This is a particular concern when working with n-gram 


features, as the probability values of some n-grams over a 





large amount of text may be very small to start with. 
Changing the product term to a sum of logarithms term can 


prevent numeric underflow: 


d 
c* = arg max log P(c,) + > log PCS, lc,) 


c, EC j=l 
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The P(cxy) term reflects the prior probability of the 





class occurring in the data set. This is typically modeled 
in one of two ways: as a uniform distribution of classes, 
or as the actual proportion of the count of the class in 
the training data. A training set containing equal 
occurrences of four classes gives a prior probability of 
0.25. One in which half the class occurrences belong to c, 
gives that class a prior probability of 0.5. Thus the 
balance of classes in the training data affects the naive 


Bayes classifier result. 


2. Smoothing 


The naive Bayes classifier builds a probabilistic 


model of a class based on training data from that class. A 








problem arises when the test data contains features that 
the model has not seen in training. These zero counts have 
a zero probability, leaving the naive Bayes classifier 
unable to predict a class. Smoothing, the process of 
shifting probability mass from frequently appearing 


features to zero count features while retaining their 





relative influence on the classifier, mitigates this 








problem. Two smoothing techniques, Laplace and Witten- 


Bell, are discussed here. 


a. Laplace Smoothing 


A simple algorithm, Laplace smoothing adds a 











value of 1 to each feature count in the data set, both test 











and training. This prevents a zero probability situation 





by ensuring every feature has a probability of occurring 


based on at least a single count, even if it does not 





appear in the training data. Adding to the feature counts 
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requires a Similar adjustment in the normalization step. 





If N is the total count of all tokens in the data set and V 
is the count of unique tokens, or types, a total of V is 
added to the individual counts by adding 1 to each [26]. 
The normalization must also be adjusted by V for a Laplace 


probability formula for a term: 





c. +1 
Pry ace (t;) = ; 
ne N+V 
b. Witten-Bell Smoothing 


Instead of altering the count of all features in 


the data set, Witten-Bell uses the probabilities of the 





features occurring in the training set to estimate the 





probability of an unseen feature. As the training set is 
processed, the probability that the next token will be of 
type i is given by [27]: 


C. 
Fyn (t,) == 





n+v 


where n is the number of tokens seen so far and v is the 
number of types seen so far. The total probability of an 
unseen type occurring next is based on the fact that it has 
already occurred v times in the training set and given by 


[27]: 


v 
n+v 





Fyn (hovel ) = 


3% Combining Classifiers 


A classifier for detecting a device to user binding 
must derive information from both the user and the device. 
In this research, the user is modeled by their short 


message writing style and the device is modeled by signal 
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characteristics. The variety of features used makes it 


mathematically difficult to simply plug them all into one 





high-dimensional classifier, though it is possible with 





appropriate normalization of the data. The fields of 
biometrics, image analysis, and handwriting analysis also 


use divers featur sets for classification of target 








items. Researchers in these fields have developed methods 








to combine multiple classifiers, each focusing on a single 





feature type, into a multimodal classifier system producing 





accuracy rates superior to those of the individual 


classifiers used independently. 


Design of a multimodal classifier depends on _ the 





outputs of the individual input classifiers. When 





combining single class labels, a majority vote scheme may 
be used. The class labels output by each component 
classifier are counted, with the class that collects the 
most votes selected as the output of the combined 


classifier [28]. Variants of this system may apply 





weights, potentially learned, to the inputs to the combined 





classifier based on a quality metric or require the winning 


class to have more than a simple majority. Input 





classifiers providing a set of ranked class labels use a 





combined classifier that joins the individual sets and re- 
ranks the labels, selecting the top-ranked label as the 
output [29]. 


The input classifiers that generat the greatest 





amount of classification information provide the 








probability distribution of the class labels, such as the 


posterior probabilities produced by a Bayesian classifier. 





[29] shows how the output probabilities Px(Ci|x) of several 
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Bayesian classifiers may be averaged to create posterior 





probabilities of the combined classifier: 


P,.(C, |x) ED AG, |x) 


k=l 


where i ranges from 1 to M classes and k from 1 to K 
classifiers. The class selected by the combined classifier 
is the one with the maximum value of Pg(Ci|x). A similar 
method uses the median value of posterior probabilities, as 
averages can be skewed by large outlier values. The 


combined posterior priorities become: 
P(C.1x 

Megs a 

P,(C, 1x) 


i 


Me 


IL 
an 


where P,(C,|x) is the median value of P,(C;|x) for the class. 


These methods provide a simplistic way to combine the 





output probabilities of Bayesian classifiers, with the 
median technique providing particularly good results as 


discussed in the biometric experimentation below. 


Bayesian probability theory lends itself to developing 
classifier combination schemes using the probability 


distributions output by individual classifiers. [28] 





provides the derivation of the product and sum rules based 


on the joint probability distribution P.(X4 p.ay ERICH) « 





Assuming the measurements are statistically independent, 





the probability distribution becomes the product of all the 





individual probability values P(x,|Ci). Applying Bayes’ 
rule and the Bayes classifier decision process yields the 


product rule where Z is assigned to class Ci; if: 
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R m R 
pe (C,)] | P(C, | x;) = max PUR C TI RE, | x;) 


j=l j=l 


The sum rule makes the assumption that the posterior 


probabilities of the individual classifiers will not differ 








Significantly from the prior probabilities. In that 


Situation, the posterior probabilities may be expressed as: 
P(C,1x,) = P(C,)d+o,) 


Where -O73 << J... -SuUbStLtuUting this value an the product nude 


form gives: 
R R 
PC) PG 1x) = PC) [d+o,) 
jal a 


Expanding the product on the right hand side of the 
above equation and ignoring the second and higher order 
terms, as they will approach zero in size, allows us to 


rewrite th quation as: 





pe» CT] P(C, | x;) = P(C,)+ PICS: oF 


Jal 


The decision rule for the sum method then states that 


Z is assigned to Ci if: 


R m R 
— : lx.j= — P ; 
ad RC aC Ix,;) max d DMG 2 (Gc, Ix,) 
In an experimental comparison of classifier 
combination methods [28] evaluated three biometric 


modalities, frontal face image, face profile image, and 








voice. For 37 users, the face images were trained with 
three pictures and tested with one. Similarity in facial 
images was gauged by distance measurements. The voice 


classifier used Hidden Markov Models to classify utterances 
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of digits from zero to nine. Results for the individual 
classifiers showed speech provided the best performance 


with a 1.4% error rate, profile images with 8.5%, and 





frontal face images with 12.2%. When the results of the 
three classifiers were combined using the techniques 


described above, the sum rule provided the best results, 





with 0.7% error rate. The product rule gave 1.4% and the 
median rule 1.2%. While the product rule was unable to 


improve on the best individual classifier, the sum and 








median rules both yielded better results. The assumptions 
made by the sum rule, that posterior and prior 


probabilities will not differ much, are not very realistic, 





but the insensitivity of the method to estimation errors 
allows it to yield good accuracy rates. This work shows 


that combining individual classifiers of different features 








may improve the results of a multimodal classification 





problem. 


G. EVALUATION CRITERIA 


Once the classifier has run, we must have a way to 


evaluate the results and compare those of different 





experiments. Standard performance metrics include 


precision, recall, F-score, and accuracy. [26] and [30] 





explain these measurements. 


Precision measures the proportion of documents 


correctly classified as belonging to a particular class, or 





the number of documents correctly labeled as a class 





divided by the total documents labeled as that class. 


Recall measures the proportion of documents belonging 





to a particular class that the classifier actually 





identified, or the number of documents correctly labeled as 
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a class divided by the total number of those documents in 





the data set. The formulas for precision and recall 
follow: 
ay TP 
precision = ———— 
TP+FP 
TP 
recall = ———— 
TP+FN 


Where TP is a true positive, the number of documents 
correctly assigned to a class. FP is a false positive, the 
number of documents incorrectly assigned to a class. FN is 
a false negative, the number of documents of one class 


identified as a member of another class. 


The F-score combines these two measures into one 











metric balanced so that neither one affects the result more 








than the other. This prevents the experimenter from making 








design adjustments that favor one measure or another. F- 


score is the harmonic mean of the precision and recall: 





Accuracy is a generalized measure of the performance 
of the classifier, finding the proportion of documents 


labeled correctly. It is obtained by dividing the number 





of correctly classified documents by the total number of 


documents in the set. While accuracy gives some indication 








as to the effectiveness of the classifier, precision and 


recall do a better job of reflecting false negatives. 





False positives and false negatives are relevant in binary 
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class problems but not in multiclass problems such as the 
one this research focuses on, meaning accuracy is a useful 


metric of evaluation. 
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III. TECHNIQUES 


A. INTRODUCTION 


This chapter describes the design and analysis of the 





experiments conducted over the course of this research. We 
first explain how the Twitter data was collected and 


processed to generat the corpus. Then we discuss the 








authorship attribution analysis of the text data. Next is 


a description of the signal collection process followed by 





th devic identification analysis of the signal data. 








Last, we detail the machine learning classifier combination 





scheme and analysis. 


B. CORPUS GENERATION 
1. Twitter Streaming 


The text data for this research was collected from 
Twitter’s public streaming Application Programming 


Interface (API). This interface allows users to write 





programs to collect and filter Twitter status updates, to 
include replies to other tweets, a user mentioned by 


another user, and direct messages, created by a non- 





protected public account. A Twitter account is required to 
access the streaming API. To initiate a connection to the 
Streaming API, the client forms an HTTP request to a 
Twitter server. Once the connection is established, the 


client consumes the resulting stream indefinitely. Closure 





of the connection may be initiated by the user, or because 





of duplicate log-ins, server restarts, lag in the 





connection due to bandwidth or a slow client, or Twitter 


network maintenance [31]. 
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The streaming feed provides data in extensible markup 


language (XML) or JavaScript object notation (JSON) format. 





The stream can be filtered by any of the keys in the data 


structure, to include user ID, keyword, or geographic 





location. Twitter offers a service called Firehose, which 
delivers all public status update data for a fee. The free 
sample feed from the basic streaming API randomly samples 
1% of the Firehose stream. The exception is when 


conducting a following filter on a user ID, which has the 








effect of “following” that user, capturing all status 


updates associated with him [31]. 





2. Text Data Collection and Processing 


To build a representative, real-world, short-text 
messaging corpus, we collected the basic Twitter sample 
feed on a near-daily basis from June 16, 2010 to August 26, 


2010. Collection of the feed occurred during weekdays and 





some nights and weekends. Any tweets in the stream flagged 








as retweets were removed in order to prevent associating 








text not written by a user with that user. The tweets wer 





sorted into files by user ID. We manually discarded users 


with fewer than ten tweets, users that did not tweet in 





English, and users with tweets that appeared to be spam, 





news headlines, or overly repetitive. A goal of 50 users 


with over 500 tweets per user was set to provide a robust 








text corpus that would also allow comparison to the Twitter 














text analysis in [3]. From the group of “good” users, we 


selected the 53 most prolific and conducted further 





collection from November 8, 2010, to December 17, 2010, 


using the follow feed to obtain all tweets sent by, to, or 
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referencing those users. Out of the 53, we were able to 








obtain 50 authors active enough to meet our tweet quantity 








goal. 

The initial sample feed collection resulted in 4045 
tweets by 53 users. The follow feed collection boosted 
this value to 114,000 tweets. The tweets were processed to 
remove @names and #tags from the text and throw out any 





tweets with fewer than three words. Those short tweets 








tended to consist of emoticons or brief comments of 


approval, amusement, disgust, or other expressions. 





Removing the short tweets changes the total tweet count to 





97,090. Table 1 provides the total tweets and maximum, 
minimum, and median tweets per author for each collection 


run and following processing. 


Table 1. Collection Quantities 


Sample | Follow 
Feed Feed |Processed 


Minimum | 60 | 290 | 278 





The next step in data preparation was to split the 








tweets into files by author. Bach tweet constituted one 
line in the author file. To provide anonymity, the files 
were labeled with a randomly selected number code instead 





of the user ID. A line-by-line random shuffle was applied 





to each file to randomize the order of tweets. For the 


first set of experiments, the first 230 tweets of each file 





were taken as the text data set. As one tweet contains 





very little feature count information to build a profile, 
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we tried combining several tweets into one document to 
represent the author. The 230 tweets were divided into ten 


documents of 23 tweets each to serve as the training set. 





In another set of experiments each document contained only 
one tweet, for a training set of 230 documents. This 
treats each utterance independently in the subsequent 


classification process. A third experiment used 120 tweets 








from each author with one tweet per document in order to 





compare results with those in [3]. The text files were 








shuffled prior to extracting the 120 tweets, generating a 











different text set than the 230 tweet set. Other 





experiments were conducted with varying tweet quantities 





and training set sizes, which are explained in the Results 
section of this thesis. 
C. AUTHORSHIP ATTRIBUTION PROCESS 


Figure 8 shows a flowchart of the text processing and 


classification process. 
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Twitter Text Processing Split Text Generate Features 
Corpus Remove date/time, Divide author file and Counts 


@names, #tags, into sets of tweets. Build NPSML file of n- 


tweets with less than gram types and counts. 
three tokens. 








Cross-Fold 
Validation 






Train Classifier Training Set 


Build author models 
with naive bayes Splitinto 10% test data 


classifier. Test Set and 90% training data. 





Test Classifier 





Author Models 










Predict authorship of 
data in test set. 


Predicted 
Authors 


Figure 8. Naive Bayes Classification Process 


Probabilistic model for 
each author. 


















1. Feature Extraction 
From the data set, we deriv th features used for 
classification. As explained in the previous chapter, 








character n-grams tolerate noise well and capture an 





author’s style and punctuation use, all important in 
classifying short messages like Twitter. This 
experimentation broke each tweet into character 2-, 3-, 4-, 
5-, and 6-grams. The start and end of each post was 


ind? cated: by a’ * character appended: to the first. and’ last 





character in the post to provide information on _ the 


N“ Ww 


placement of the n-gram. The character also 





represented white space. Any capitalization or 
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misspellings were preserved. Table 2 shows the top five n- 
grams and their counts for each value of n in the entire 


corpus. 





The n-grams are conceptually generated using a sliding 





window of size n moving over the utterance, recording and 
counting each n-character token. All punctuation marks and 
white spaces are included as characters. A software 
program parses each tweet and records the n-grams and 


counts associated with each author, saving them in a file 





in the NPSML format, one file for each value of n. NPSML 
format is shown in Figure 9. The key field was the name of 


the file from which the feature labels and counts were 





derived. All weights were set to 1.0 for all files. The 


class field was set to the identifier code of the author of 





the utterance. 


Key Weight Class FeatureLabell FeatureValuel [FeatureLabel2 FeatureValued..]\n 





Figure 9. NPSML Format 


Prior to running the classifier, we must split the 
feature count files into test and training sets. An 


internal line shuffle program randomized the order of the 
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posts in the feature count files. A ten-fold cross 


validation was applied, in which a feature count file was 








split into ten subfiles, with nine used for training the 
classifier and one used for testing it. The nps-bTTSplit 


software program from the NPS Machine Learning Library [32] 





was used to generate th test and train files, ensuring 





each author was represented with an equal number of posts 
in each of the subfiles. None of the posts used in a 


training file were also used in the associated test file. 


2. Naive Bayes Classifier 


This experimentation used the Naval Postgraduate 


School Natural Language Processing Lab naive Bayes 





classification package. This software package uses the 





NPSML file format as input. The learning portion uses the 


smoothed feature counts from the input training data file 











to generate a probabilistic model. One set of experiments 
was conducted using Laplace add-one smoothing. A second 
set was conducted using Witten-Bell smoothing. The 





classification program used the model generated by the 








learning program and the NPSML-formatted test data _ to 

















determine the most probable class assignment for each test 
utterance. The program output the key and predicted class 
for each utterance in the test file. Each fold of the 10- 








fold cross validation was run and the outputs averaged for 


the final classification result. As each author has an 





equal number of tweets in the data set, prior probabilities 


for each author were fixed and equal. 
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D. PATTERN OF LIFE ANALYSIS 


Human beings often fall into habitual daily routines. 
The act of communicating with others may fit into this 


routine, allowing an observer to discern a pattern. A user 





may log into his computer at the same tim very weekday 








morning, or call his mother to chat during his commute home 
every evening. Analyzing a user’s communication patterns 


may aid in the identification of the user. 





1. Twitter Time Analysis 


Fach tweet collected includes a date/time field. 








Figure 10 shows the format of the date/time field. This 
analysis focused on a simple pattern analysis of send time 
by hour of the day. This may capture any user patterns 


centered on a work or school schedule. 


Thu Nov 11 23:48:45 +0000 2010 
a bcde#o ff g h 


Day Minute 
Month Second 
Date Time Zone 
Hour Year 





Figure 10. Date/Time Field Format 


The date/time fields were stripped from each tweet and 





saved in a separate file for each user. A line-by-line 





shuffle program randomized the order of the timestamps. 
The first 120 were taken from each author file as a sample 
set. This sample set was then subdivided, grouping 


timestamps into files labeled with the author as the class. 





We used training set sizes of three, five, ten, and twelve 
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for testing. The files were processed into the NPSML 
format with the sent hours and their counts as the keys and 
values contained in each file. The NPSML files were 
processed in the same manner as the text files. The hours 
and counts were divided into test and training groups, 
holding out 10% as a test set. The naive Bayes learning 
and classify programs were run on a 10-fold cross 


validation using Witten-Bell smoothing and the results 





averaged to determine the final output class for each test 





input. Other xperiments were conducted with varying 
timestamp quantities and training set sizes, which are 


explained in the Results section of this thesis. 


2. Social Network Analysis 


Another characteristic of an individual’s 


communication patterns is the group of people with whom he 








communicates. In a telephone or SMS network, discerning 





this would require some access to signaling information or 


service provider records. In Twitter, users often include 





the screen name of the user they are specifically speaking 





to or about in their tweet. Layton et al. noted a 26% 





reduction in the accuracy of their authorship attribution 


method when screen names were removed from the tweets. For 





a simple social network analysis, we examined the screen 


names referenced by the users in our corpus independent of 








the text of their tweets. 


The data processing used to conduct the social network 


analysis was identical to the date analysis. Instead of 











pulling the date/time field out of each tweet, any @name 

found in parsing was saved to a file by author. The 

shuffling, splitting, and grouping into training sets was 
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conducted using sets of three, six, ten, and twelve out of 





120 @names per author. NPSML files of @names and their 


counts per author were built. These were divided into test 











and train sets, and naive Bayes learning and classification 
with Witten-Bell smoothing was performed on a 10-fold cross 
validation with the results averaged to give final output 


classes for each input. 


E. CELL PHONE SIGNAL ANALYSIS 


Based on the success of [4], we focused on signal 
modulation features to build device characteristic vectors. 
GSM modulation parameters are governed by the European 
Telecommunications Standards Institute (ETSI) in their 3rd 
Generation Partnership Project (3GPP) standards [33], [34]. 


Cell phone manufacturers test their products for quality 








assurance purposes, ensuring phone users have an acceptable 


link quality and that phones do not interfere with other 








users. Three signal characteristics that are measured and 
controlled are peak phase error, root mean square (RMS) 


phase error, and frequency error. The Agilent 8922S GSM 





Test Set is a signal analyzer geared for measuring these 


standard modulation characteristics of a GSM mobile 





station. 


ale Signal Collection 


The equipment used in conducting the mobile station 
Signal measurements was the Agilent 8922S GSM Test Set, the 
LGS Innovations Tactical Base Station Router (TacBSR), and 
an assortment of unlocked GSM-capable cell phones. The 
8922S was run in test mode with small whip antenna serving 


as the RF input element. The use of an antenna required 
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adjustment of the expected input amplitude decibel value. 
This varied by phone and was set based on Test Set 
measurements of the phone output power and the presence or 


absence of RF overload errors. Figure 11 shows the cell 





control screen where these values were set. 


CELL STRTUS - CORE ROL 
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Chanel | 
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Exogiled [rAeul 
TR Level Burgi Tyee 


Figure ll. Cell Control Screen (From [35]) 


In test mode, the 8922S transmits a GSM broadcast 








Signal on a specified frequency channel, or absolute radio 





frequency channel number (ARFCN). It has a separate ARFCN 





designated for the traffic channel the phone will use to 





communicate with the BTS. Our phones could not connect to 


the 8922S as a BTS with the antenna and SIM cards we had 





available for use, so we used the TacBSR as the BTS. The 





TacBSR was configured to operate in the E-GSM900 spectrum 
using ARFCN 875 as the traffic channel. This correlates to 
an uplink frequency of 882.2 MHz. The 8922S was configured 
as a midpoint collector listening to ARFCN 875. All the 
phones we tested operated in this GSM band. 
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Besides the amplitude of the input, two other settings 
required adjustment before taking measurements, the traffic 
channel timeslot and the trigger delay. When setting up a 
call with the TacBSR, we noted the calling phone was 
assigned timeslot 2 and the called phone was assigned 
timeslot 3%, To establish a traffic channel for 
measurement, we had to establish a voice call between two 
phones. The calling phone was noted and proper traffic 
channels were set when conducting measurements. The 
trigger delay sets the time delay between a valid trigger 


event and the beginning of a measurement. The 8922S uses 





the midamble of a GSM frame as a trigger, as it is easy to 
detect. The Data Bits screen of the Phase/Frequency page 
shows the bit sequence of the GSM frame, highlighting the 
midamble. Figure 12 shows an example of this screen. The 


trigger timing was set by observing the value of the First 





Bit field and adjusting the trigger delay to force that 


value close to zero. 
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Figure 12. Data Bits Screen (From [35]) 
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Once the amplitude, timeslot, and trigger delay were 





set, we measured th thr modulation characteristics. A 








more detailed explanation of the modulation measurements is 








included in Appendix A. This was done from the 
Phase/Frequency page, shown in Figure 13. To get an 
average value over a fixed number of transmission bursts 


i Bos, 


for each measurement, we used the multi-burst feature for 











ten bursts. The measured phone was held near the antenna 


of the 8922S, while the phone on the other end of the call 





was placed across the room to minimize cross-channel 


interference. We collected 30 values for peak phase, RMS 





phase, and frequency error as averaged over 10 transmission 


bursts. The values were read from the screen and recorded. 


PHASE AND FREGUENCY ERROR 
Error Mean Maxinun Minima Lost 





Fegok Fhose 


RMS Phose 





RF Analyzer MOBILE 


Amplitude Chan 
T&A Ley 
dBm} Timeslot 


Frequency 


Figure 13. Phase and Frequency Error Screen (Multi- 
burst on) (From [35]) 
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The data collection method for the cell phone features 





was sufficiently time and labor intensive that we used th 











30 data samples of each feature from each phone as the 


foundation for building a larger data set. The measured 





values of each feature were input into a program that built 
a probability density function based on a histogram of the 
results. The program smoothed the samples by building a 
histogram, and then used the scipy gaussian _kde module to 
create the probability density function. It then drew a 
specified number of random values weighted by the 
probability density [36]. We used this method to generate 


160 more values representative of each handset. 


2. Data Analysis and Classification 


The modulation characteristic data required some 


preprocessing before use as an identification vector. An 





average and standard deviation value was calculated for 
each characteristic for each phone. The smallest standard 
deviation value of each feature was used as a bin size, and 
the raw data was binned, generating histograms for each 
phone. For example, if the standard deviation of the 
frequency error for phone 1 was 8.2, phone 2 was 9.5, and 
phone 3 was 7.3, the bin size for frequency error for the 
three phones would be 7.3. Binning the data reduced noise 


from the measurement process and discretized values from 





continuous domains to aid in feature counting for the 





classification step. Fach set of {peak phase error, RMS 
phase error, frequency error} bin values for each 
collection data point served as a feature vector for the 


phone it was associated with. 
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For learning and classification, th feature vectors 





were split into separate files. As with the time and name 


analysis, we experimented with varying the size of the 





training set. A software program turned each set of files 





into an NPSML-formatted file of features and their counts. 
The NPSML files were provided as input to the NPS naive 


Bayes learning and classifying programs. The average 





results of a 10-fold cross validation were given as the 








output classes. We conducted experiments with varying 
numbers of feature vectors and document sizes, which are 
explained further in the next chapter of this thesis. 


F. COMBINING CLASSIFIERS 


Once the individual classification results were 








obtained, we xperimented with combining these results to 


see if there was any subsequent improvement in accuracy. 








The NPS nb-classify program can provide as its output the 
logarithms of the probabilities for each class label. 
Based on the availability of that information and the prior 
work described in the previous chapter, we chose the sum 


rule combination scheme. The sum rule takes the 





probability outputs of a set of classifiers and adds the 








probability values of each class label. The class label 
with the maximum summed value is selected as the output 


label. 


In the first set of experiments, one phone was 


assigned to one author. The output probability logarithms 











for each class label from the phone classifier were added 
to the output probability logarithms of the text 


classifier. The maximum combined value of each 





classification test was taken as the result. We conducted 
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20 experiments rotating the author assignment for each 
phone to verify the consistency of results across phone- 


author pairings. The accuracy results for each pairing 








were averaged to obtain the overall accuracy. Appendix E 


contains the phone-author pairing matrix. 


To mitigate the influence of the differences in 


magnitudes of the text and device probability logarithms on 





the summation result, these values were normalized across 





the individual classes for each classifier output. The 








normalized output of the signal classifier for a particular 


phone was added to the normalized output of the text 

















classifier for its associated author. Th xperimentation 
process was repeated on these values. The pattern of life 
classifier results were included in another set of 











experiments, adding the output values to the text and 





device output values to attain a combined output value. 








Another xperiment was conducted to gauge the 





effectiveness of the combined classifier at detecting a 


change of author on a single phone. Using the same set of 








20 authors and phones as above, th tweet text set was 
modified to simulate a change of author. We chose two of 
the 20 authors to swap. Out of the 50 tweet per author 
data set, 10 tweets from each of the two authors were 
labeled as the other author. The labeling scheme included 
a flag so that we could identify the modified tweets after 


classification. The modified test set was classified using 








the classifier model trained previously. None of the 








tweets in the test data had been part of the training 


model. The results were normalized and added to the 








normalized phone classifier results. The results of the 
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combined classifier wer xamined to determine if the 


modified tweets were detected and appropriately classified. 





This pr 


ocess was repeated using 





data se 


t with a training set size 





25 tweet 


other author. 


5.5 





of five. 





In 


the 120 tweet per author 


this case, 





ts from each of the two authors were labeled as the 


THIS PAGE INTENTIONALLY LEFT BLANK 
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IV. RESULTS AND ANALYSIS 

















A. TEXT RESULTS 

This section examines the authorship attribution 
results from classification of the Twitter text corpus. We 
first present the effects of varying the size of the 
character n-gram in th feature set and the type of 
smoothing used. As 140 characters or less do not contain 








much feature information to make a profile, we experiment 
with combining several individual tweets from one author 
into a “document”, increasing the total word count of the 





experimental unit of analysis by using 





a set of multiple 





tweets rather than just one tweet, then training the 
classifier and testing with these “documents”. We 
experiment with classifying data sets consisting of 
different total quantities of tweets per author, combining 
these tweets into documents of varying tweet count. We 








then test the effect on classifier accuracy of changing the 
number of authors and the total number of tweets per author 


in the data set. 


Analysis of the Twitter text showed that the author 


could be determin 





d by a naive Bayes classifier at a rate 


e) 





Significantly better than chance. Table shows the 





accuracy results averaged over a ten-fold cross validation 


of 50 





of a multiclass classification authors using 230 





tweets per author with character 2- through 6-grams as the 


feature set. Results for LaPlace add-one and Witten-Bell 





smoothing are presented. These smoothing techniques are 
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explained in more detail in Chapter II, Section F. As 





expected, the Witten-Bell smoothing performed better than 


the add-one smoothing. 


Table 3. Classification Accuracy Results for 50 Authors 
With 230 Tweets Per Author 


Smoothing 
LaPlace |Witten-Bell 








In order to compare our results to those published in 


[3], we performed the same analysis using 120 tweets per 





author. The SCAP method shows better results than our 
classifier. Table 4 shows our accuracy results compared to 
their results when their @name and #tag removal 


preprocessor is applied. 





Table 4. Classification Accuracy Results for 50 Authors 
With 120 Tweets Per Author With Comparison to SCAP 
Method 


Witten-Bell| SCAP [3] 





The results presented thus far use a single tweet as a 
document for classification purposes. Combining multiple 
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tweets in a document and using a set of these documents to 
build the feature and count values for the training and 


test inputs to the classifier improves the accuracy results 








Significantly. Table 5 shows the accuracy results of the 








classifier averaged over the ten-fold cross validation for 





50 authors with 230 total tweets per author divided into 


ten documents of 23 tweets’ each, a value determined 








empirically to provide the best accuracy results as 


described next. 





Table 5. Classification Accuracy Results for 50 Authors 
With 230 Tweets per Author Combined into Documents 
of Size 23 Tweets 


RE 











Grouping multiple tweets into a document improves the 





accuracy of the classifier significantly. As the character 





3-gram feature and Witten-Bell smoothing process provided 
the best results in early testing, we continued further 
testing with those parameters fixed. The next set of tests 
evaluated the effects of document size, in tweets, on 


accuracy. To complete the 90%:10% train to test split, a 





minimum of ten documents are required for classification. 





We used ten documents with a range of five to 20 tweets per 





document. Table 6 shows the results of that experiment. 
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Table 6. Classification Accuracy Results for 50 Authors 





Using 10 Documents With Increasing Number of 
Tweets per Document 


document] total # 
size tweets | accuracy 





Fixing the previous experiment at ten documents caused 
the larger document sizes to use a proportionally larger 


total number of tweets for classification. To determine 





whether the accuracy improvement could be attributed to the 


document size or the total number of tweets in the data 





set, we conducted another experiment in which we set the 
total number of tweets at or near 150. In a situation 


where the available number of messages per author is 





limited, this distinction is important in designing an 





accurate classification process. If the use of multi-tweet 





documents enhances classifier accuracy in a fixed corpus 





size, acceptable results may be obtained using fewer tweets 





per author than if tweets are tested individually. The 
document size was varied from five to 15 tweets per 
document. The total number of tweets per classification 








run was the multiple of the document size closest to 150. 
Figure 14 shows the results of this experiment. The 
document size range from one to five tweets was examined 


further, finding the classification accuracy for 50, 100, 





and 150 tweets across that range. Those results are 


displayed in Figure 15. 
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Accuracy by Document Size for 150 Tweets 
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Figure 14. Classification Accuracy for 50 Authors Using 
150 Tweets per Author With Increasing Document 
Size 





Accuracy by Document Size 


—=5 


0 
——100 




















Accuracy 








——150 











2 3 4 


Document Size (Tweets) 





Figure 15. Classification Accuracy for 50 Authors by 
Document Size and Total Number of Tweets per 
Author 
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The effects of changing the number of tweets per 
author and changing the number of authors were evaluated in 
more detail. The number of authors in each trial was 
varied from two to 50, selected randomly from the set of 50 
authors. The number of tweets was varied from 30 to 190 
with a document size of one tweet. Figures 16 and 17 show 
graphs of the accuracy results over the range examined. 
Improvement in accuracy appears to level off at about 22 


authors. 
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Figure 16. Classification Accuracy Results for Various 
Total Tweet Values Per Author With Increasing 
Author Count 
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Accuracy by Tweet Count and Author 
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Figure 17. Classification Accuracy Results for Various 
Author Counts With Increasing Total Tweet Per 
Author Values 





The classification accuracy curve for increasing 





number of authors in the data set levels out at 20 authors. 





We conducted further experimentation with a set of 20 
authors randomly selected from the 50-author data set. We 
used total tweet per author values of 30, 50, 100, 120, and 
150. The document sizes tested ranged from one to 15 


tweets per document. Figures 18 and 19 show the classifier 
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accuracy results averaged over a ten-fold cross validation 
for 20 authors with varying numbers of tweets per author 


and tweets per document. 


Text Classifier Accuracy Results for 20 Authors 
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Figure 18. Classification Results for 20 Authors With 
Varying Values of Tweets per Author and Tweets per 
Document 
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Classifier Accuracy for 100+ Total Tweets and Varying 
Training Set Sizes 























> 
o 
o 
— 
bs 
0 
12) 
_— 


Training Set Size (Tweets) 





Figure 19. Classification Results for 20 Authors With 
More Than 100 Tweets per Author and Varying Tweets 
per Document 





We next investigated if the improvement in 


classification accuracy results generated by combining 





multiple tweets into a document occurred during the 
training or the testing of the classifier. Using the set 


of 20 authors, we took the models built for 50 tweets per 





author with five tweets per document, 100 tweets per author 


with five and ten tweets per document, and 120 tweets per 





author with five and 12 tweets per document and tested a 
new set of single tweets of the appropriate size on each 
model. The results of these tests are presented in Table 
7. The consistency of the accuracy results implies the 


added feature depth of the multi-tweet document generates 





its accuracy benefits during the testing of the classifier 


rather than the training. 
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Table 7. Classification Accuracy Results for Single Tweet 
Documents Tested on Models Trained on Multi-Tweet 
Documents of the Specified Quantity 








Tweets | Document Size 


perAuthor| Trained On 


This section presented the re 


sults 





ct ct 





of a series of 


classification experiments conducted on a Twitter corpus of 





50 authors. We found that using character 3-grams as a 


feature set and Witten-Bell smoothing produced the best 


accuracy results. Classification accuracy improved as the 





number of tweets per author increased, reaching 49.5% 


accuracy at 230 tweets per author, 


a value based on the 


smallest author data set in the corpus. 


feature depth of the text by combining mul 


a document and training and testing 





multi-tweet documents improves cl 


the cl 








Increasing the 
ltiple tweets into 


assifier with the 





assification accuracy 











Significantly, with accuracy levels 


reaching 90% at ten 


tweets per document for 120 and 150 tweets per author and 





99% at 23 tweets per document for 230 tweets per author. 


Confusion matrices and per-author accuracy results for the 


above tests are provided in Appendix 
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B. 


B. PATTERN OF LIFE RESULTS 


The pattern of life analysis builds a basic 
description of an author’s tweeting habits by examining the 


time of day he sends his messages. We use the hour of day 





the message is sent as the feature used for classification. 











Like in the text classification process, we try to increase 











the depth of the feature set by combining the send times of 





multiple tweets into one training set and using the 





<feature, count> values of the combined set as the input to 


the classifier. 


Analysis of the hour of day the users tweet showed 





that the author of a tweet could be determined by a naive 


Bayes classifier at an accuracy rate just slightly better 





than chance. We used the send hour (GMT) of 120 tweets for 





each author as the time value. As the send time is 
reported in hour:minute:second format, this serves to bin 


the times into 24 bins, one per hour. The 120-hour values 





were split into training sets, similar to grouping multiple 





tweets into documents as in the previous section. We 





xperimented with training set sizes ranging from one to 12 

















tweet hours per set. As with the message text, accuracy 
improved when grouping multiple tweet times into a training 
set. The training sets are input into the naive Bayes 


classifier and trained and tested using Witten-Bell 





smoothing. Figure 20 shows the classification accuracy 
results averaged over a ten-fold cross validation for 50 


authors with 120 tweet time values per author grouped into 





training sets ranging in size from one to 12 tweet time 


values per set. 
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Classification Accuracy of 120 Tweet Times 
with Varying Training Set Sizes 
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Figure 20. Classification Accuracy for 50 Authors of 
120 Tweet Times per Author With Increasing Number 
of Tweet Times per Training Set 





Compared to the text classification results, time of 





day was not an effective way to discriminate between 
authors. As the testing focused on English speakers, it is 
possible that many of the users were located in similar 
time zones, and thus maintained similar schedules. A few 
authors could be classified with very good results, with 
two authors identified at over 60% accuracy over 120 tweets 
with a training set size of three. Table 8 shows the 
accuracy result for each author for 120 tweets per author 
and training set sizes of three, five, ten, and 12 tweet 
hours per set. Histograms of the tweet send times for each 


author are presented in Appendix C. 
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Table 8. Classifier Accuracy Results for Each Author Using 
120 Tweet Times per Author and Increasing Number 
of Tweet Times per Training Set 














Tweet Hours per Training Set 


3 


Tweet Hours per Training Set 


3 12 


Author 


Al- 

A 

jo) 

00 

Ww 

Ww 

ql 

jo) 

00 

Ww 

Ww 

fl 

oO 
> 
c 
fa 
=> 
ie) 
> 





356 
369 
382 
388 
404! 
411 
413 
4431 
5106 


0.175 
0.45 
0.625 
0.025 
0.025 
0.1000 
0.2750 
0.2000 0.4000 


N 
ima) 
A 
jo) 
jo) 
00 
ey) 
Ww 


0.10 
0.00 


S/O] 
IN 
oO};u 
oO};oO 
oO};O 
r|1O 
® | 00 
D)}w 
NI] @ 


0 
0.30 
0.50 
0.50 
0.30 
0417 0.00 
0833 0.10 


oO 

BS 

rary 

O|o}s 
NIN ITN 
Unjyunju 
oO};/O];O 
oO};/O];oO 
NIN] e 
nNlO!1n 
81819 
WIN 


ea) 


HERE EERE ER 


Ww 


Xe} 
N 
pay 
oO 
N 
ia) 
[o) 
oO 


© 
) 
8 
3S 
fo) 






ra re oe 
0.0817 
0.2000 0.2500 
0.1250 0.0000 
0.2500 
00000 0.2500 

2753 | 0.0000 0.1667 
[5106 4 





Cc. SOCIAL NETWORK RESULTS 


Analysis of the social network of the authors as 











determined by the @names referenced in their tweets 
provided excellent accuracy results. The corpus contained 
a total of 72,888 references to 6,105 unique @names. The 


least connected author, gauged by the author’s ratio of 
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@names to tweets in the corpus, made 1020 @name references 
in 9004 tweets, while the most connected author made 1570 


@name references in 1174 tweets. We extracted the @names 





from each author’s tweets and selected 120 from each 





author. The @names were used as the features input to the 
naive Bayes classifier using Witten-Bell smoothing. Like 


the text and time data, we experimented with combining 








multiple @names into a training set to increase the depth 





of the experimental unit. Accuracy improved when the 
@names drawn from multiple tweets were combined into one 


training set and this training set was used to generate the 





feature and count data. The averaged results of a ten-fold 








cross validation classification of 120 @names per author 
for 50 authors with a training set size ranging from three 


to 12 @names per set are presented in figure 21. 





Social Network Accuracy of 120 @names with 
Increasing @names per Training Set 
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Figure 21. Classification Accuracy Results for Social 
Network Analysis of 120 @names per Author With 
Increasing Number of @names per Training Set 
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The very high classification accuracy rate implies 





that the authors randomly chosen for this study did not 


interact with each other or have many common acquaintances. 





In a practical application, this lack of interconnectivity 


may not apply. A work group or a criminal cell may have a 





large number of common nodes in their social network, 





making this sort of analysis less ffective. For this 


study, the social network proved to be too discriminative, 





and we conducted no further experimentation with @names. 





In future work, we plan to examine the accuracy of the 
social network as a function of the number of 


classification classes, or authors. 





D. PHONE SIGNAL ANALYSIS 


This section presents the results of the 


classification of modulation characteristics collected from 








cell phone signals in an effort to correctly identify the 
specific device transmitting the signal. We form the three 
measured modulation characteristics (peak phase error, RMS 
phase error, and frequency error) into a feature vector, 
and then use a naive Bayes classifier to predict the device 
associated with a set of signal feature vectors. As with 


the previous experiments, we combine multiple signal 








feature vectors into a training set in order to improve 





classification results by increasing the depth of the 


feature context in each training set. The classifier is 





trained and tested with a variety of total signal vector 
counts per device, and different quantities of signal 


vectors per training set. 
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Analysis of the phone signal modulation 
characteristics showed that devices could be identified by 
the naive Bayes classifier at an accuracy level well above 
random chance. Figure 22 shows the accuracy results 
averaged over a ten-fold cross validation for 20 phones 
with a total signal feature vector quantity of 30 - 150 
vectors per device and a training set size of one to five 


vectors per training set. 


Device Identification Accuracy for Various 
Vector Totals and Training Set Sizes 
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Figure 22. Classification Accuracy for 20 Devices With 
Varying Vectors per Training Set and Total Vectors 
per Device 


Training set sizes larger than five vectors per set 
were explored using 150 total data vectors. Figure 23 


shows the results of these experiments for 20 phones. As 





the training set size increases the total number of 








classification results per phone decreases, giving each 


incorrect classification a larger impact on the accuracy 
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result. For example, 150 data vectors divided into five 


per training set gives 30 sets for classification. 





Incorrectly classifying one of these documents yields a 





96.7% accuracy result. When 150 data vectors are grouped 


into training sets of size 15, ten sets are created for 











classification. Incorrectly classifying one set yields a 
90% accuracy result. Figure 23 reflects this phenomenon. 


Phone Accuracy for 150 Data Points 
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Figure 23. Classification Accuracy Results for 20 
Phones With 150 Data Vectors and Varying Document 
Size 


Confusion matrices and accuracy results per phone are 
included in Appendix D. In future work, we wish to 


investigate which features of the phone signals provide the 





most discriminatory classification power. 





E. COMBINED CLASSIFIERS 


Combining the outputs of the individual classifiers 





improved upon the authorship attribution results of the 


7S 


individual text classifier. Per the sum-rule combination 
scheme discussed in Chapter II, our experimentation added 


the output probability logarithms, averaged over a ten-fold 





cross validation, of the individual phone and text 





classifiers. The text data sets used in the experiments 


were the classification results from the 20 authors using 





30, 50, 100, 120, and 150 tweets per author and document 


sizes of one to 15 tweets per document. The phone data 








sets used were the classification results from 20 phones 
using the same number of signal vectors per phone and 
Signal vectors per training set as the text data sets. We 


repeated this process for selected data sets after 





normalizing the output probability logarithms. Figures 24- 
28 show the accuracy results of the individual and combined 
classifiers for the five data sets. Appendix E contains 


the accuracy results for each author-phone pairing tested. 


Individual and Combined Classifier Results for 30 
Tweets with Varying Training Set Sizes 
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Figure 24. Classification Accuracy of Individual and 
Combined Classifiers for 30 Tweets/Signal Vectors 
and Various Training Set Sizes 
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Individual and Combined Classifier Results for 50 
Tweets with Varying Training Set Sizes 
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Figure 25. Classification Accuracy of Individual and 
Combined Classifiers for 50 Tweets/Signal Vectors 
and Various Training Set Sizes 





Individual and Combined Classifier Results for 100 
Tweets with Varying Training Set Sizes 
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Figure 26. Classification Accuracy of Individual and 
Combined Classifiers for 100 Tweets/Signal Vectors 
and Various Training Set Sizes 





715 


Individual and Combined Classifier Results for 120 
Tweets with Varying Training Set Sizes 
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Figure 27. Classification Accuracy of Individual and 
Combined Classifiers for 120 Tweets/Signal Vectors 
and Various Training Set Sizes 





Individual and Combined Classifier Results for 150 
Tweets with Varying Training Set Sizes 
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Figure 28. Classification Accuracy of Individual and 
Combined Classifiers for 150 Tweets/Signal Vectors 
and Various Training Set Sizes 
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Normalizing the output probability logarithms, before 





adding them together, results in accuracy values superior 


to either of the individual classifiers. The probability 








logarithm outputs of the text classifier are orders of 
magnitude smaller than the phone classifier, with bigger 


variance between the label values becaus of the much 








larger text feature space. Table 9 shows an example using 


the author 2744 and the phone htc376. 


Table 9. Comparing Combination of Probability Logarithms 
and Combination of Normalized Probability 
Logarithms 


| | | probabilitylogarithms | | normalized 
P| ntc376_ | 2744 | sum |_| htc376 | 2744 | sum _ 
| bberry | 1045 | -58.390 |-1536.188|-1594.578] | -0.0579 | -0.0498 | -0.1077 | 
1388 -0.1030 
-0.0992 
| htc374 | 1921 | -32.916 |-1524.800|-1557.716 -0.0821 
-47.781_ |-1530.894]-1578.675 -0.0970 

2744 
-47.069 |-1527.684]-1574.753 -0.0962 
-47.608 |-1603.219|-1650.828 -0.0992 
-53.588 -0.1025 
-51.262 |-1548.413]-1599.674 -0.1011 
6111 | -50.704 |-1567.741|-1618.446 -0.1011 
-0.1060 
-0.1036 
-0.1069 
-0.1046 
-0.1037 
-0.1029 
-0.1101 
-0.0996 
-0.0943 


1 
-1626.801 


8487 


-49.989 -1593.35 
P| Max 


4 
-1651.889 
4 
1 





TT 


The text classifier incorrectly selects author 1388 as 





the most probable class, with a probability logarithm 








output 34.691 orders of magnitude higher than the actual 
class. The phone classifier correctly selects htc376, but 


its probability logarithm output is only 25.043 orders of 








magnitude more than htc371, the phone associated with 


author 1388. Thus, the combined classifier selects the 





htc371-1388 pair. Once the probability logarithms are 





normalized, the relative variation between class labels 





decreases. The text classifier selection of 1388 is only 








0.0011 orders of magnitude greater than the value for 2744, 
while the value of htc371 is now 0.0249 orders of magnitude 
less than the correct value of htc376. Thus, the combined 


classifier outputs the correct htc376-2744 pairing based on 





the strength of the phone classifier. 


Another example using the same phone-author pair 


demonstrates how the normalized probability logarithms can 





have a negative effect on the combined classifier accuracy. 


Table 10 shows a set of classifier outputs in which the 





incorrect phone classifier causes the normalized 


probability logarithm combination to make an incorrect 





phone-author pair classification, while the non-normalized 








output combination selected the correct pair. 
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Table 10. 


Normalized Probability Logarithm Combination 


Resulting in Incorrect Classification 


sum 

-1950.94 

“1895.78 

“1944.46 

“1863.99 

“1879.14 
7\- 


e8) 


bberry 
htc371 
htc373 
htc374 
htc37 
htc376 
htc601 
htc_rob 
iphone4 


uw 
Oo 


1944.79 
-2055.25 
-1911.80 
-1973.60 


Nn AL 


iphoneS 


e¥) 


iphone7 
n8_59 


-1902.916| -1950.72 
“1941.83 
“1937.34 


: 


B 








sum 
-0.1077 
-0.1047 
-0.0500 | -0.1040 
8 
-0.0958 
-0.0824 
-0.1021 
-0.1009 
-0.1015 
-0.0984 
-0.0952 
-0.1041 


n97_430 0 .0498 | -0.1037 
n97_444 -1875.332| -1933.848 .0497 | -0.1045 
n97_618 -0.0502 | -0.1058 
n97_620| 7958 -1956.092 -0.1042 
nok_128 -0.0993 
nok_e5 -1949.830 -0.1045 
nok_e62 -0.1024 
treo -1945.976 -0.0981 

In this case, the output probability logarithm 

combination provided the correct classification based on 


the 


outputs emphasized the phone 





incorrect classification result. 





investigate different classifier combination mechanisms 


evaluate in more detail the 





classifier inputs on the combined 





balance these effects on the final 
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strength of the text classifier, 


classifier 


effects 





but the normalized 
and produced an 


In future work we wish to 





to 
of the individual 
result, and find ways to 


result. 


F. DETECTING AUTHOR CHANGES 


In this section, we test whether the combined 


classification system we have built is able to detect a 





Simulated change in user. For example, a criminal may use 





a cell phone for a period of time, then sell the cell phone 


to someone else and get a new one in order to elude anyone 





who may be tracking the old cell phone. Our research 


guestion asks: in the absence of any other knowledge about 








the target user and device, can our classifier detect that 








someone new is using the phone. We do this by simulating a 


“change” in author in the combination scheme, exchanging 





tweets from two authors in our set of 20 and analyzing if 





the previously trained models of the text and combined 





classifiers can detect and correctly classify the 





mislabeled tweets in the test set. 


Analysis of the combined classifiers, with the 
simulated change in author in two author-phone pairings, 
showed that the classifier combination could detect the 
change in tweet author less than 40% of the time. Tweets 
from authors 7958 and 9417 were exchanged to simulate a 
change in user on a phone. Testing was conducted using a 
set of 50 tweets per author with training set sizes of one 


and three tweets per set, and with a set of 120 tweets per 





author with a training set size of five tweets per set. 











The text classifier alone was able to detect the change up 


to 100% of the time, but the classification accuracy of the 





unaltered text, true positives, was also rather low. 





Tables 11-13 show the text-only classification confusion 





matrices for the two affected authors. The “Original” row 


shows the results for the unaffected tweets. The “Swapped” 
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row shows the results for the tweets that were exchanged 


between the authors, listed under the labeled author. The 





true positives are highlighted. Table 14 lists the 





accuracy rates these matrices display and compares to the 


accuracy over all 20 authors. 


Based on previous testing, adding the phone classifier 





output to the text classifier output should improve the 


true positive rate. Our analysis shows the true positive 





rate does improve, but the false positive rate also 








increases. When using the output probability logarithms in 
combination a small difference in accuracy between the 


actual and injected text is noted, with author 7958 more 








distinct than author 9147. When using the normalized 





output probability logarithms in combination, no difference 





between the actual and injected text can be detected. 
Tables 15-20 are the combined classifier confusion matrices 


for the two affected authors. Counts are added for all 20 











phone-author pairs tested. The “Original” row shows the 
results for the unaffected tweets. The “Swapped” row shows 
the results for the tweets that were exchanged between the 


authors, listed under the labeled author. The true 


positives are highlighted. Table 21 shows the accuracy 





results averaged over all 20 author-phone pairings. 
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Table 11. Confusion Matrix for Text Classifier for Simulated Author Change Using 50 


Tweets per Author With One Tweet per Document With Ten Tweets Exchanged Between 
Authors 


label -> 


| Author} 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | 5599 | 5742 | 6111 | 6886 | 7100 | 7241 | 7754 | 7958| 8164 | 8487 | 9417 | 9800] 
Original | 7958] 1 | 1 |o{a1{ifafofo;{2i{ita{t2}2iatia}twl}il2}oti| 





o7] 2{4{[3{1to{s{a{ato{taiitiatot2}2i3ii|o Bay o| 
swapped| 7958] 1 | 1 | o | 2{1{[i2fofi2{o/o/folilo}|o}o}l|i}o}lo}iljo| 
pov} of 3}ototoftoftoftoftoftiztiftotoftofto}tsjtofofofo| 


Table 12. Confusion Matrix for Text Classifier for Simulated Author Change Using 50 


Tweets per Author With Three Tweets per Document With Nine Tweets Exchanged 
Between Authors 








label -> 


jam Author] 2085 | 298 | 1734 | 1923. 24s | az | iss | 368s | spe | 5792 | oats | opas | ao | 724s | | Tass | ies | ue? | ost | g000| 
| o | o i o 
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Table 13. Confusion Matrix for Text Classifier for Simulated Author Change Using 120 
Tweets per Author With Five Tweets per Document With 25 Tweets Exchanged Between 
Authors 





label -> 


|___[Author} 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | 5599 | 5742 | 6111 | 68ge | 7100 | 7241 | 7754 | 7958 | 8164 | 8487 | 9417 | 9800 | 
Original | 7958{ 0 | o | o | o | of 1 }ofotfototo{aiilot|o Mm o |] o | 2] o | 
[SSN eG sa I Oe lene | oe sl a 





Ei 
Swapped| 7958{ 0 | 1 | o | o | o | o| of ofofo}o}o}ofofofiljlot}otls3 o| 
pou7} ofotototototototototoftotololo Ma o}otoltoa | 


Table 14. 





Classification Accuracy of Text Classifier for Simulated Author Change - True 
Positives (non-swap) and False Positives 


(Swap) 
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Table 15. Confusion Matrix for Normalized Combined Classifier for Simulated Author Change 


Using 50 Tweets per Author With One Tweet per Document With Ten Tweets Exchanged 


Between Authors 





label -> 


| Author| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | 5599 | 5742 | 6111 | 6886 | 7100 | 7241 | 7754 | 7958| 8164 | 8487 | 9417 | 9800] 
Original | 7958] 2 | o | 2 {o[if{ia1fofa/2/ 35 | 4 | 1 | a5 | 14 | 


| 4 [a] a | 

pour} 9 |i} st2t2t2tot2tofotitiltial sd] e | iz ig | 20 | 
Swapped| 7958| 3 | 1 | 1 [o{[2{]of]1/2{/o/[/4]o]11] 3] 4 | 2 |i] 5 {0 | 
pour} of} 3}2trit2ztitoftoftoftiftototeftofto}teoets| 4 {rs} s | 





Table 16. Confusion Matrix for Normalized Combined Classifier for Simulated Author Change 


Using 50 Tweets per Author With Three Tweets per Document With Nine Tweets 
Exchanged Between Authors 








label -> 


—___ Author} 1085 | 3388 | 1734 | no2i | 2546 | 274s | 3155 | 3698 | 5509 | 5742 | eta | 6ase | 7100 | rast | 7754 | ose | 6164 | e487 | 9417 | 9800 
igi | o | o | o | zm 
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Table 17. 





Confusion Matrix for Normalized Combined Classifier for Simulated Author Change 
Using 120 Tweets per Author With Five Tweets per Document With 25 Tweets 
Exchanged Between Authors 





label -> 


| Author| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | 5599 | 5742 | 6111 | 6886 | 7100 | 7241 | 7754 | 7958| 8164 | 8487 | 9417 | 9800] 
po | o | o | Oo 


Original | 7958] 0 | o | of ofofofofofolfololol}ot|ola2 | 





Table 18. Confusion Matrix for Non-normalized Combined Classifier for Simulated Author 


Change Using 50 Tweets per Author With One Tweet per Document With Ten Tweets 
Exchanged Between Authors 





label -> 


| Author} 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | 5599 | 5742 | 6111 | 6886 | 7100 | 7241 | 7754] 7958] 8164 | 8487 | 9417 | 9800] 
7958 | 17 | 10 | 7 | 4 | 12 { 21{ o | 6 | 22 | 18 | 67 | 10 | a2 | 23 | 32 | apa} 22 | 14 | 4 | 14 | 
Ea | 4 | 


Swapped 
| 9417 | 0 


| 35 | a7 | 13 | 9 | 38 | 33 | 49 | 5 | 24 | 





85 


Table 19. Confusion Matrix for Non-normalized Combined Classifier for Simulated Author 
Change Using 50 Tweets per Author With Thr Tweets per Document With Nine 
Tweets Exchanged Between Authors 





label -> 


| __[Author| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | 5599 | 5742 | 6111 | 68ge | 7100 | 7241 | 7754 | 7958 | 8164 | 8487 | 9417 | 9800 | 
Pei a2 Oe 2a | 


rieinal: 79584) 0. ates [as | 02 eo. ee ee 
Cds ae eae een eee eae eae see 

Swopped) Tose} 6 Jo fa to} of oe} oe eo ee} oe et Ot | 0 | 
pou7} ofojtotoftototoftoftoftoftotoftoftoftofatofofwof{o | 





Table 20. Confusion Matrix for Non-normalized Combined Classifier for Simulated Author 
Change Using 120 Tweets per Author With Five Tweets per Document With 25 Tweets 
Exchanged Between Authors 
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Table 21. Classification Accuracy of Combined Classifiers for Detecting Simulated Author 
Change - True Positives (non-swap) and False Positives (Swap) 


| | Non-normalized Outputs | Normalized Outputs 
5 


0.9974 
0.9500 
1.0000 


9147 Swap 0.0600 | 0.6667 | 0.8700 | 0.8200 | 0.9667 | 1.0000 
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V. CONCLUSIONS 


A. SUMMARY 
This thesis asked two questions: can a multi-modal 
naive Bayes classifier, combining user-specific text 


authorship characteristics and device-specific signal 


characteristics, improve on the accuracy results of a text 





classifier alone — especially for short messages — and can 
such a classifier detect if a phone, normally used by one 


individual, is unexpectedly used by a different individual. 








Our results show that the answer to the first question is 


yes, while the answer to the second is that it is possible, 





but our method requires further refinement to improve 


accuracy. 


In our text classification experiments, classification 








of 120 individual Twitter messages from 50 authors using a 








multiclass naive Bayes classifier produced 40.3% authorship 


attribution accuracy, less than the 54.4% found by Layton, 





Watters, and Dazeley, using the Source Code Author Profiles 


(SCAP) method [3], the most comparable related work to our 











own. However, combining multiple tweets to generate a text 
feature vector for input to the classifier improves 
authorship attribution accuracy. Using a feature vector 


from 23 combined messages produces the best result of 99.6% 





accuracy. 


Analysis of a user’s message communication pattern by 
the time of day they sent tweets did not produce a good 
classifier. In the best case tested, using the send times 


of 120 tweets per author from 50 authors combined into 12 





tweets per training set, the classification accuracy was 
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35%. It is possible that by selecting for English speakers 


we obtained a set of authors living in similar time zones. 





The social network analysis, classifying authors by the 





@names mentioned in their tweets, performed extremely well. 


We attained 94% accuracy classifying 120 @names per author 





from 50 authors combined into 3 @names per training set, 


with better accuracy results as training set size 





increased. The random selection of authors for the study 
likely chose users unrelated to each other, with 


distinctive social networks that enabled high 





classification accuracy, suggesting that the performance of 





such approaches may decrease as the author set size 








increases. 
Th devic identification portion of the research 
performed very well. Classification of 120 individual cel] 

















phone radio signal modulation characteristic vectors for 20 


GSM cell phones resulted in a 90% classification accuracy. 








This compares favorably to the 99% accuracy of Brik et al. 
for modulation characteristics of 802.11 devices [4]. 
Combining the signal vectors into training sets of five 
Signal vectors per set improved classification accuracy to 


99%. 





Sum-rule combination of the text and phone 








classifiers, adding the probability logarithm outputs of 





the individual classifiers, improves upon the results of 








the text classifier. The multimodal classifier performed 








better than the text classifier in every experiment as the 

















high devic identification accuracies influenced the 
combined accuracy result. For 20 author-phone pairs with 


120 tweets/signal vectors per pair the multimodal 
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classifier accuracy was 60%. When the tweets/signal 


vectors were combined into 5 per training set, the 





multimodal classifier accuracy surpassed 98%. Predictably, 





summing the classifier outputs produces better accuracy 


results when the individual classifier accuracy results are 





also high. 


The phone user change simulation testing showed that 


the multimodal classifier could not reliably detect if 








tweets from two of the authors were exchanged to simulate a 


different author in a phone-author pairing. The text 





classifier alone achieved the best results in detecting 





author change, achieving a false positive rate of 0% with a 
true positive rate of 52.6% for one author, and a false 
positive rate of 20% and true positive rate of 73.7% for 
the other. Those numbers were using 120 tweets per author 
grouped into training sets of five tweets per set. The 


multimodal classifier results on the same data set were a 








false positive rate of 87% and true positive rate of 95% 


for one author, and a false positive rate of 64% and true 





positive rate of 98.4% for the other. This indicates that 
the phone classifier results are skewing the multimodal 
classifier to favor the phone detection. A more accurate 
text classifier may produce better author change detection 


results. 


These results suggest that the classification of the 





user-device binding is feasible. It could be employed as a 





secondary security layer for a business or government cell 





phone management scheme to detect unauthorized phone use or 
the loss or theft of a phone. In a law enforcement 


context, this method could help verify the author of SMS 
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messages sent from a suspect’s phone. With improvement of 


the author change detection method, it may help detect when 





a suspect ceases to use or sells a temporary, or “burner”, 


phone. Authorship attribution of short messages is a 








difficult problem, but we have shown that a multimodal 


classifier can improve upon the current state of the art. 


B. FUTURE WORK 


This research suggests a number of avenues for further 


research in authorship attribution of short messages. 


1. Social Network Analysis 


The social network analysis conducted here was 
superficial, but showed potentially highly effective 
results. Future work could build a new Twitter corpus, 
possibly using some of the authors here as a basis. The 
follow-feed collection of a starting set of selected 
authors would gather tweets to and from users with whom 
they are routinely in contact. Then repeat this process to 
expand their networks. A larger set of interconnected 
users could be built through this discover-and-collect 


method. Once a satisfactorily sized corpus is built, the 





text-based authorship attribution methods used in this 


research could be repeated. 


2. Other Machine Learning Methods 


This research used naive Bayes classification for 





every data type. Future research could try other machine 


learning techniques, particularly SVM, to try to improve 





accuracy results. The binning conducted to discretize 


continuous variables in the phone signal collection and in 
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the time analysis may hurt accuracy results. SVM is better 
suited for machine learning of continuous variables. This 


research used a multiclass classifier. Developing an 





ffectiv one-of-many classifier may have more practical 





application uses when searching for a specific individual 





in an undefined population set. 


Another potential research avenue would be to further 


tune the multimodal classifier, experimenting with 





different classifier combination schemes, and _ possibly 
using input weighting to mitigate the heavy influence of 


the phone signal results on the multimodal results. 


3: Expanded Phone Signal Analysis 


We used three easily obtained modulation 


characteristics of the cell phone signal to conduct our 





classification testing. Future research could determine 


which of these three characteristics is the most 





discriminative. Other signal characteristics such as bit 








error rate and signal ramp time could also be explored. 


An additional research area in the phone signal 


analysis would be to develop a means for measuring these 





Signal characteristics with a software defined radio 
system. The test equipment used in our research is not a 


useful product for a practical application of phone signal 





analysis. A software defined radio receiver would be a 


more transportable and covert collection asset. 


4. Segmentation Inside Boundaries 


The author change in phone-author pairings experiment 
conducted here could be expanded upon. Our experiment used 


the technique of combining multiple tweets into a document 
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and classifying the featur vector of that document in 


order to increase th featur space and improve 





classification accuracy. In the change of author 








experiment, all the tweets in one document belonged to the 
same author. The author change was simulated by exchanging 


the documents of two authors. Treating the document as a 








bounded feature space, one could exchange tweets within a 








document. The goal then would be to detect in which 





document the change of author in the author-phone pairing 


occurs, and where within that document it occurs. 


5: Temporal Posting Aspects 


The tweets used in this research wer treated as 








independent slices of text data from an author. Tweets 





were selected randomly for use. In reality, tweets, and 
short message communication units in general, have a 


temporal linkage between each other, especially in a 





conversational context. Future work could examine the 





linkages between sequential tweets, and if those linkages 
could be defined and exploited. Also of interest is 


whether these linkages can be discriminated by topic or by 





stylistic characteristics. 


Cc. CONCLUDING REMARKS 


This research explores a holistic view of 
communication as a function of a user and ae device 


together. We explore the user-to-device binding and our 





ability to detect this binding as a pair. The results of 





this work show that it is easier to detect an author when 





he is bound to a device than it is to detect this author 





alone, with a 50% accuracy improvement in the most 
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disparate case. Knowing that in a real-world application a 
security professional may have a limited number of text and 


phone signal data points to work with, we tested our method 





on data sets of various sizes, looking to find ways to 


elicit quality accuracy results from minimal data sets. 





Authorship attribution of short messages is a difficult 
problem, but we have shown here that there are ways to 
effectively accomplish it. The practical applications of 
this research range from law enforcement and intelligence 


gathering to wireless network security. 
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APPENDIX A: MEASURING GSM PHASE AND FREQUENCY 
ERRORS 


Errors in signal modulation generated by a GSM 
transmitter cause degradation in the performance of the 
system. Small manufacturing variations in the electronics 
fabrication and assembly may cause persistent error in 
signal modulation and transmission. The ETSI 3GPP 
standards [33] and [34] impose quality standards on the 


allowable error for base stations and mobile stations. 





Manufacturers hav developed quality control test 





mechanisms and equipment for their devices to ensure 


compliance with the standards and acceptable performance 





for users in the field. We use these mechanisms for test 





and identification of the mobile devices used in our 








experiments. 


Once a call is established between the handset and the 








tower, the 8922S samples the uplink signal. This sampling 
collects the actual phase trajectory of the signal. In 
GMSK modulation, the signal carries bit-level data by 
affecting changes in carrier frequency, which cause 
corresponding changes in phase state. A one is represented 


by a carrier frequency change of +67.708 kHz, causing a 





phase state change of +90 degrees in the I/Q plane. A zero 
is represented by a carrier frequency change of -67.708 
kHz, a phase state change of -90 degrees. The phase 
trajectory, then, consists of the phase state changes 
representing the series of data bits in the signal [37]. 
An error in the phase state change is reflected by a 
deviation from the 90 degree value. The signal analyzer 


collects the actual phase trajectory transmitted by the 
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handset. It then demodulates the signal to determine the 





transmitted bit sequence. From the bit sequence, it 
calculates the ideal phase trajectory. The phase error is 
the difference between these two trajectories [38]. Figure 





29 1s a graphical representation of this process. 


Theory in pictures: GMSK 
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Figure 29. GMSK Phase Error Measurement (From [38]) 


The phase error measurement forms the basis of the 





three error values we use in our device identification 





scheme. The root mean square of the error measurement is 
calculated and reported as the RMS phase error. The 


largest phase deviation from ideal is reported as the peak 





phase error. The frequency error is the mean slope of the 
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error line (phase/time) [38]. Figure 30 is a graphical 
representation of these error measurements relative to the 


calculated error line. 


icc ite Example: E-G3M900, BTS, GMSK 
oe == 
RMS phase error 
+2 


—20° 


Gradient of ling = Mean frequency error 
iimit = 0.05 ppm = 45 Hz (approc) 





Figure 30. GMSK Modulation Errors and Specified Limits 
(From [38]) 


The Agilent 8922S collects the GSM signal from the 


handset, performs the calculations described above over the 





Signal bursts, and reports the peak phase error, RMS phase 
error, and frequency error. These modulation errors 


provide the basis for the device identification 








classification experiments conducted in this thesis. 
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APPENDIX B: ADDITIONAL TEXT CLASSIFICATION DATA 


Table 22. Confusion Matrix for 30 Tweets per Author With One Tweet per Document 


I Ne ETD 
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Table 23. Confusion Matrix for 30 Tweets per Author With Three Tweets per Document 
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Confusion Matrix for 50 Tweets per Author With One Tweet per Document 


Table 24. 
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Confusion Matrix for 50 Tweets per Author With Three Tweets per Document 


Table 25. 
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Confusion Matrix for 50 Tweets per Author With Five Tweets per Document 


Table 26. 
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Table 27. Confusion Matrix for 120 Tweets per Author With One Tweet per Document 
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Table 28. Confusion Matrix for 120 Tweets per Author With Three Tweets per Document 


label -> 
|_| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | 5599 | 5742 | 6111 | 6886 | 7100 
pros] 2a | 3 | o | 1 fi|3/o;]3i{/1f/o}]of;ofifoj|o}]3{ofilf2o| 








107 





Table 29. Confusion Matrix for 120 Tweets per Author With Five Tweets per Document 


|_| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | 5599 | 5742 | 6111 | 6886 | 7100 | 7241 | 7754 | 7958 | 8164 | 8487 | 9417 | 9800 | 
fee |e a ye ia | 0 10 aos | os ie oe 
| o | of o | | o | o | 
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Table 30. Confusion Matrix for 150 Tweets per Author With One Tweet per Document 


|_| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | 5599 | 5742 | 6111 | 6886 | 7100 | 7241 | 7754 | 7958 | 8164 | 8487 | 9417 | 9800 | 
1 


ae |e | a a ae a | ee 
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Table 31. Confusion Matrix for 150 Tweets per Author With Three Tweets per Document 
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Table 32. Confusion Matrix for 150 Tweets per Author With Five Tweets per Document 
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Per Author Accuracy Rates for Various Total 


Table 33. 


Tweets per Author and Tweets per Document 


feosmes| s | a [a{s|s[sis]s[als{s 
Document 1 3 1 3 5 1 3 5 1 3 5 

| 1388 | 0.2 | 06 | 0.42 | 0.824] 0.9 | 04 | 0.775] 0.958] 0.453] 0.76 | 0.867 
| 1734 | 0.033| 0.1 | 0.08 |0.059| 0 _| 0.183] 0.225] 0.292] 0.2 | 0.22 | 0.267 
| 1921 | 0.133| 0.2 | 03 [0.294] 03 | 0.508] 08 | 0.875] 0.527| 0.82 | 0.967 
| 2546 | 0.367| 07 | 03 |0.647| 0.9 | 0.483] 0.75 | 0.917] 0.513] 0.86 | 0.967 
| 2744 | 0.333| 06 | 04 [0.588] 0.9 |o617| o85| 1 | 058] oz] 1 





| 6886 | 03 | 04 | 0.26 | 0.294] 0.4 | 0.492] 0.725 | 0.833] 0.507| 0.82 | 0.933 


7100 


| 6111 | o.467| 06 | 05 [0471] 08 | 0.542] 085 | 1 | 0.607] 0.84 | 0.967 
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APPENDIX C: TWEET SEND TIME ADDITIONAL DATA 


Author 1045 Time Histogram 




















Figure 31. 
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Figure 32. 
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Author 1734 Time Histogram 
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Figure 33. Author 1734 Tweet Send Time Histogram 


Author 1921 Time Histogram 























12 14 16 18 20 22 


8 10 





Hour (GMT) 


Figure 34. Author 1921 Tweet Send Time Histogram 
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Author 2546 Time Histogram 



































Figure 35. 
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Author 2744 Time Histogram 




















Figure 36. 
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Author 3155 Time Histogram 
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Figure 37. Author 3155 Tweet Send Time Histogram 


Author 3693 Time Histogram 
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Figure 38. Author 3693 Tweet Send Time Histogram 
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Author 5599 Time Histogram 
































| T al T i T | 
0 2 4 14 16 18 2 22 


6 8 10 12 





Hour (GMT) 


Figure 39. Author 5599 Tweet Send Time Histogram 


Author 5742 Time Histogram 
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Figure 40. Author 5742 Tweet Send Time Histogram 


117 


Author 6111 Time Histogram 
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Figure 41. Author 6111 Tweet Send Time Histogram 


Author 6886 Time Histogram 
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Figure 42. Author 6886 Tweet Send Time Histogram 
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Author 7100 Time Histogram 
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Figure 43. Author 7100 Tweet Send Time Histogram 


Author 7241 Time Histogram 
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Figure 44. Author 7241 Tweet Send Time Histogram 
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Author 7754 Time Histogram 








Figure 45. 
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Figure 46. 
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Author 8164 Time Histogram 
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Figure 47. Author 8164 Tweet Send Time Histogram 


Author 8487 Time Histogram 
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Figure 48. Author 8487 Tweet Send Time Histogram 
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Author 9417 Time Histogram 
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Figure 49. Author 9417 Tweet Send Time Histogram 


Author 9800 Time Histogram 
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Figure 50. Author 9800 Tweet Send Time Histogram 


122 


APPENDIX D: PHONE CLASSIFIER ADDITIONAL DATA 


Table 34. Confusion Matrix for 30 Signal Vectors per Phone With One Signal Vector per 
Training Set 
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Table 35. Confusion Matrix for 30 Signal Vectors per Phone With Two Signal Vectors per 
Training Set 
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Table 36. Confusion Matrix for 30 Signal Vectors per Phone With Three Signal Vectors per 
Training Set 
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Table 37. Confusion Matrix for 50 Signal Vectors per Phone With One Signal Vector per 
Training Set 
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Table 38. Confusion Matrix for 50 Signal Vectors per Phone With Two Signal Vectors per 
Training Set 


label 
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Table 39. Confusion Matrix for 50 Signal Vectors per Phone With Three Signal Vectors per 
Training Set 
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Table 40. Confusion Matrix for 50 Signal Vectors per Phone With Four Signal Vectors per 
Training Set 
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Table 41. Confusion Matrix for 50 Signal Vectors per Phone With Five Signal Vectors per 
Training Set 
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Table 42. Confusion Matrix for 100 Signal Vectors per Phone With One Signal Vector per 
Training Set 
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Table 43. Confusion Matrix for 100 Signal Vectors per Phone With Two Signal Vectors per 
Training Set 
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Table 44. Confusion Matrix for 100 Signal Vectors per Phone With Three Signal Vectors per 
Training Set 
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Table 45. Confusion Matrix for 120 Signal Vectors per Phone With One Signal Vector per 
Training Set 
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Table 46. Confusion Matrix for 120 Signal Vectors per Phone With Two Signal Vectors per 
Training Set 
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Table 47. Confusion Matrix for 120 Signal Vectors per Phone With Three Signal Vectors per 
Training Set 
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Table 48. Confusion Matrix for 150 Signal Vectors per Phone With One Signal Vector per 
Training Set 
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Table 49. Confusion Matrix for 150 Signal Vectors per Phone With Two Signal Vectors per 
Training Set 
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Table 50. Confusion Matrix for 150 Signal Vectors per Phone With Three Signal Vectors per 
Training Set 
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Per Phone Accuracy Rates for Various Total Signal Vectors per Phone and Vectors 


Table 51. 
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APPENDIX E: COMBINED CLASSIFIER ADDITIONAL DATA 


Table 52. Phone to Author Pairing Matrix 








Authors 

Pmatrix} 2 | 2} 3 | 4}s5 )6 7] 38] 9 7 0] a] 2 | 23 
3693 7100 
5599 7241 
1734 5742 7754 

92 6111 7958 

2546 | 2744 | 3155 6886 8164 

74 7100 8487 

9 | 5742 7241 9417 

7754 9800 
7958 1045 
14 7100 8164 1388 


N Re 
+ bb 


6 


ito) 


U1} w 
uw 
WO 
WO |W 


Phones|i 


1388 3155 
16 9417 1734 3693 


a 

fe) 

~ 
lo 

fo>) 
N 
O 
4 
ary 
N 


1388 2744 6111 
388 | 1734 3155 6886 
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Table 53. Confusion Matrix for Normalized Combined Classifier Matrix Pairing 1 Using 30 
Tweets/Signal Vectors With One Tweet/Signal Vector per Training Set 





label -> 


|__| 1045 | 388 | 3734 | 192 | 2546 | 2748 | 315s | 3683 | 5599 | 5742 out | eaee | 7100 | 72aa | 77s | rose | exe | e487 | 97 | 9800] 


N 
wn 
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Table 54. Confusion Matrix for Non-normalized Combined Classifier Matrix Pairing 1 Using 





30 Tweets/Signal Vectors With One Tweet/Signal Vector per Training Set 
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Table 55. Confusion Matrix for Normalized Combined Classifier Matrix Pairing 1 Using 30 
Tweets/Signal Vectors With Three Tweets/Signal Vectors per Training Set 





label -> 


|_| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | 5599 | 5742 | 6111 | 6886 | 7100 | 7241 | 754 | 7958 | 8164 | 8487 | 9417 | 9800 | 


2 
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Table 56. Confusion Matrix for Non-normalized Combined Classifier Matrix Pairing 1 Using 
30 Tweets/Signal Vectors With Three Tweets/Signal Vectors per Training Set 





label -> 


|_| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | ss99 | 5742 | 6111 | 6886 | 7100 | 7241 | 7754 | 7958 | 8164 | 8487 | 9417 | 9800 | 





145 


Table 57. Confusion Matrix for Normalized Combined Classifier Matrix Pairing 1 Using 50 
Tweets/Signal Vectors With One Tweet/Signal Vector per Training Set 





label -> 


|_| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | 5599 | 5742 | 6111 | 6886 | 7100 | 7241 | 754 | 7958 | 8164 | 8487 | 9417 | 9800 | 
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Table 58. Confusion Matrix for Non-normalized Combined Classifier Matrix Pairing 1 Using 
50 Tweets/Signal Vectors With One Tweet/Signal Vector per Training Set 





label -> 


|_| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | ss99 | 5742 | 6111 | 6886 | 7100 | 7241 | 7754 | 7958 | 8164 | 8487 | 9417 | 9800 | 
Hiowsfas | 4 fo fo}2tsftatatayatstat2t2totstaifototia 
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Table 59. Confusion Matrix for Normalized Combined Classifier Matrix Pairing 1 Using 50 
Tweets/Signal Vectors With Three Tweets/Signal Vectors per Training Set 





label -> 


|__| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | ss99 | 5742 | 6111 | 6886 | 7100 | 7241 | 7754 | 7958 | 8164 | 8487 | 9417 | 9800 | 
jsoas| a6 | o | o | of of of} of of of of o};ofo}totifot}tolflot}o|o| 


ay 
5 
| 





9800 
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Table 60. Confusion Matrix for Non-normalized Combined Classifier Matrix Pairing 1 Using 
50 Tweets/Signal Vectors With Three Tweets/Signal Vectors per Training Set 





label -> 


|_| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | ss99 | 5742 | 6111 | 6886 | 7100 | 7241 | 7754 | 7958 | 8164 | 8487 | 9417 | 9800 | 
Hiowsfaz}aftofo}otitifofotaitiftotototototofotot}o 
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Table 61. Confusion Matrix for Normalized Combined Classifier Matrix Pairing 1 Using 120 
Tweets/Signal Vectors With One Tweet/Signal Vector per Training Set 





label -> 


|_| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | ss99 | 5742 | 6111 | 6886 | 7100 | 7241 | 7754 | 7958 | 8164 | 8487 | 9417 | 9800 | 
jsoas| aig} o | o | o | of o}ofo}tofofo}tofo}fo{fofo}2z{ot}o|o| 
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Table 62. Confusion Matrix for Non-normalized Combined Classifier Matrix Pairing 1 Using 
120 Tweets/Signal Vectors With One Tweet/Signal Vector per Training Set 





label -> 


|_| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | ss99 | 5742 | 6111 | 6886 | 7100 | 7241 | 7754 | 7958 | 8164 | 8487 | 9417 | 9800 | 
iow for | fo }o}2tat2ie}staoatatitatstotstayitoto 
7 


Ei 
Rb 





| o | 
| 0 | 
| 0 | 
| o | 
Ese] 
Le] 
| 0 | 
| o | 
| 0 | 
Le 


ea) 
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Table 63. Confusion Matrix for Normalized Combined Classifier Matrix Pairing 1 Using 120 
Tweets/Signal Vectors With Three Tweets/Signal Vectors per Training Set 





label -> 


|_| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | ss99 | 5742 | 6111 | 6886 | 7100 | 7241 | 7754 | 7958 | 8164 | 8487 | 9417 | 9800 | 
jsoas| 4o | o | o | of of of} of of of of o};ofo}toftofot}tolflot}o|o| 
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Table 64. Confusion Matrix for Non-normalized Combined Classifier Matrix Pairing 1 Using 
120 Tweets/Signal Vectors With Three Tweets/Signal Vectors per Training Set 





label -> 


|_| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | ss99 | 5742 | 6111 | 6886 | 7100 | 7241 | 7754 | 7958 | 8164 | 8487 | 9417 | 9800 | 
How fae} af o}fo}otitotofotototo}titototitofototo 
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Table 65. Confusion Matrix for Normalized Combined Classifier Matrix Pairing 1 Using 120 
Tweets/Signal Vectors With Five Tweets/Signal Vectors per Training Set 





label -> 


|_| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | ss99 | 5742 | 6111 | 6886 | 7100 | 7241 | 7754 | 7958 | 8164 | 8487 | 9417 | 9800 | 
jsoas| 22} 1 | o | ofofo}toftotifofo}tofo}to{tofot}tolotjo|o| 
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Table 66. Confusion Matrix for Non-normalized Combined Classifier Matrix Pairing 1 Using 
120 Tweets/Signal Vectors With Five Tweets/Signal Vectors per Training Set 





label -> 


|_| 1045 | 1388 | 1734 | 1921 | 2546 | 2744 | 3155 | 3693 | ss99 | 5742 | 6111 | 6886 | 7100 | 7241 | 7754 | 7958 | 8164 | 8487 | 9417 | 9800 | 
jsoas| 2a | o | o | of of of of of of ofo};ofo}toftofot}olfotfo|o| 
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Per Pair Combined Classifier Accuracy Results by Total Tweets/Signal Vectors 


Table 67. 


and Tweets/Signal Vectors per Training Set 
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Normalized Combination Results Per Trial 
Training Set Size 1 














Accuracy 
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Figure 51. Averaged Accuracy Results of Normalized 
Combined Classifiers for Each Phone-Author Pairing 
Matrix Using One Tweet per Training Set 


Nomalized Combination Results Per Trial 
Training Set Size 3 
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Figure 52. Averaged Accuracy Results of Normalized 
Combined Classifiers for Each Phone-Author Pairing 
Matrix Using Three Tweets per Training Set 
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Combination Results Per Trial Training Set Size 1 
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Figure 53. Averaged Accuracy Results of Non-Normalized 
Combined Classifiers for Each Phone-Author Pairing 
Matrix Using One Tweet per Training Set 
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Figure 54. Averaged Accuracy Results of Non-Normalized 
Combined Classifiers for Each Phone-Author Pairing 
Matrix Using Three Tweets per Training Set 
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